Assignment title: Information
A Report submitted in partial fulfilment of the regulations governing the award of the degree of BSc (Honours) Computer Science at the
University of
Northumbria at Newcastle
Project Report (CM0645)
Player Tracking for Tennis Video Analysis
2015 / 2016
General Computing Project
Authorship Declaration
DECLARATIONS
I declare the following:
(1) That the material contained in this dissertation is the end result of my own work and that due acknowledgement has been given in the bibliography and references to ALL sources be they printed, electronic or personal.
(2) The Word Count of this Dissertation is .16,716..................................
(3) that unless this dissertation has been confirmed as confidential, I agree to an entire electronic copy or sections of the dissertation to being placed on the eLearning Portal (Blackboard), if deemed appropriate, to allow future students the opportunity to see examples of past dissertations. I understand that if displayed on eLearning Portal it would be made available for no longer than five years and that student would be able to print off copies or download.
(4) I agree to my dissertation being submitted to a plagiarism detection service, where it will be stored in a database and compared against work submitted from this or any other School or from other institutions using the service.
In the event of the service detecting a high degree of similarity between content within the service this will be reported back to my supervisor and second marker, who may decide to undertake further investigation that may ultimately lead to disciplinary actions, should instances of plagiarism be detected.
(5) I have read the Northumbria University/Engineering and Environment Policy Statement on Ethics in Research and Consultancy and I confirm that ethical issues have been considered, evaluated and appropriately addressed in this research.
SIGNED:
Abstract
The research on object tracking has attracted many researches. Player tracking in a video sequence is significant for many applications; these applications include automatic video surveillance, autonomous robotic systems and human-computer interfaces. There are various tracking algorithms, which implements the methods for object demonstration based on object texture, feature and shape models. To perform video tracking an algorithm analyses sequential video frames. There are a variety of algorithms, each of them has limitations. The algorithm must be selected based on the use of the object tracking in desired field. This project describes various techniques and methods that used for detecting and tracking of objects.
Tracking sports players using colour will contribute to the system because it will work correctly according to the colour of the instance is matchless related to the background of the system. Tracking players by carrying out the sketch as a function is feature is increasingly effective for even targets that are non-rigid. Spatial histogram tracking provides outcomes that are satisfactory even as the target object experiences scope size or comparable background colour. In these project robust techniques constructed on contour, colour and the use of spectrograms in object movements have been examined, presented as well as implementation.
This project introduces the algorithms for tracking tennis players for an automatic tennis video system. The tracking system detects and tracks the players using the mean shift algorithm with multiple objects tracking by image subtraction. The common mean shift tracking algorithm expects the aimed object to be separately from background, but this statement is not accurate when tracking system is running in dynamic backgrounds such as in sport videos. In order to define the players' movements the system has to detect and track positions using appropriate algorithms including frame differencing and optical flow as well as applying some correlation methods to remove false detections. The outcome of the system display that the techniques is able to classify both players' movements.
This project describes the implementation of automatic tracking of players in sports analysis and objects. The tracking of the object for relevant system is significant objective in computer vision. There are several techniques that are developed for automatic monitoring such using sports player, pedestrians and different moving objects. The main challenging section on this project involves object choosing for appropriate models and features for tracking and identifying the aimed objects.
Contents
Introduction 4
Analysis 7
Overview of the system 7
Description of visual information processing 7
The Strategy 10
Introduction to Object Detection and Tracking 10
Concept of object tracking 11
Existing Tracking Methods 16
Object tracking applications 16
Multi object tracking 18
Layering based tracking 20
Ground-Truth Data Collection 27
Related work 27
Challenges 28
Synthesis 29
Design 29
User Interface Design 29
Implementation Error! Bookmark not defined.
Difficulties 37
Testing 39
Evaluation 41
User Study 41
Comparison with existing algorithms 41
Results 42
Conclusion 47
Recommendations for further work 47
References 49
Appendices 50
Introduction
The goal of this project is to track a tennis player however this can be a challenging task. To analyse the tracking system high quality level it is essential to find the position of the players. The first step is to take player tracking algorithm and segments which will create the background models for the area surrounding and playing field separately according to their different colours. This will allow the system to predict the positions of the current frame for more accuracy.
When it comes to computer vision, tracking objects play a very challenging role in our life. When powerful computers introduced there was high-definition camcorders, and applications that required automated analysis of a video, there was a huge increase on these technologies. Video analysis has three main steps and those are: detection of the objects that are in action named targeting of the objects, tracking of the target in object in successive frames and analyse traces for studying the behaviour and movement of the characters.
The tracking systems are required from the following:
• Recognition based on the movement which includes the identification of human dependence on time, automatically detecting objects, etc.
• Human based interaction in combination with recognition of gestures and body movement tracking with data input computations among others.
• Monitoring traffic that involves real-time gather statistics on traffic to control traffic.
• Automated navigation of vehicles to find ways on the roads without avoiding obstacles.
The system is described for the tracking task to find the path of an object in the frame when it moves into the background section. Equally, putting object tracker will make the marks consistent when objects are tracked in subsequent frames. In addition, based on the tracking application we will include extra object-specific information including shape and area. The list below present challenging issues when tracking is carried out:
• information-loss due to the 2D image,
• Noise of image
• complex movement of the object,
• Occlusion appearance on images and complex form of the aimed target,
• Requirement of the real time tracking movement.
For several years so many methods for tracking objects has been developed and these methods and algorithms vary primarily from each other based on their views. The speed and size of the player could pose real problems and chances of false detection are high. The technique carry out components of detecting the motion based on image template matching to track the player. Kalman filter and Particle filters will be considered to predict the location of foreground objects. The rapid growth of sports video database demands efficient tools for automatic sports video annotation. The recent advances in computer vision algorithms, creating such tools has become possible. Tennis sports video annotation systems has many potential applications such as video retrieval, enhanced broadcast, object-based video encoding, video streaming, analysis of player tactics, computer aided coaching. After objects have been correctly classified and tracked, an algorithm may want to analyse the object motions such as the gait of moving human. This could be difficult for non-rigid objects such as human if the object view is not in the right perspective for the algorithm.
Automatic tracking has high level descriptions rely on level features. In the context of tennis game, the player trajectory contains set of information for the video. The motivation of the work presented in this project tennis player tracking for the automatic tracking system.
The list of Algorithms
1. Extract background image.
2. Frame difference method
3. Blob analysis
4. Detect and track players.
Computer vision is available to enable computer to intercept motion and understanding of human vision. Visual object tracking has been occurred as important and challenging topic in computer vision. The core of visual object tracking is to estimate the motion of the object in each frame of the input sequence of images. Tracking is defined as to follow the motion of an object moving under the action of given forces. With the growing use of computer based applications object tracking technique has been very useful for surveillance, healthcare, traffic monitoring, autonomous robotic system and human-computer interfaces. In surveillance system visual object tracking us used to detect and track suspicious object behaviour. In traffic flow monitoring object tracking is used to track the vehicles and monitor the flow of the traffic so as to avoid any jams. One of the applications of object tracking is video compression. Example video object tracking is applied in banks, parking lots, residential areas and shopping centres for monitoring human activities. Tracking of object can be complex due to the articulated nature of the object. Occlusions can also a major problem in the object tracking. The Video records are sequences of images and each image is called frames which displays in fast environment frequency so that our eyes can percept the continuous of its content.
A general detection of the object algorithm could be appropriate, but it will be quite challenging to control unrecognised objects in the system. Object detection is used for finding objects that are aimed in the video sequence to cluster pixels of the objects. Detection of the objects could be completed by relevant and appropriate techniques such as multiple objects tracking by image subtraction, mean shift, frame differencing and Optical flow.
The approaches to classify the objects of the system include techniques such as Motion-based classification, Shape-based classification, texture based classification and Colour based classification. The tracking of the system could be defined as the problem of approximating the path of an object in the video sequence as it makes actions by moving around the scene. The approach that must be considered when tracking of the objects includes, multiple tracking by image subtraction, kernel tracking, point tracking and silhouette. This project is structured in the following list:
I have chosen to use OpenCV as the library for analysis. OpenCv is very well suited for tracking objects, which this project is all about. Multi object tracking will be considered for this project because the system has to track 2 players at the same time. When the objects are not detected by the system in the video frame but they are in previous and following videos, even in this case a correct trajectory will still be produced by the system. By difference, the system could ignore false-positive detection in a few frames. Conversely, if the system is challenging with the multiple tracking target problem, the next step in the results will be difficult to optimize the problems in the space of all possible families of trajectories.
Multi object tracking could be disintegrated into two single steps that display individual issues. Firstly, time-independent detection will be considered and then the prediction scheme gathers the number and locations of targets from the signal at every instance independently. In this case, generating models will be involved so that the signal given in the target show discriminative machine learning based algorithm. The second section involves target motions and modelling detection errors.
Analysis
Overview of the system
Primarily the playing scene is detected and that involves the selection of the tennis playing field sequences. The court detection involves identification of the court location in the scene and that provides specific information, such as shape, size and location. The tracking of the player technique decides the speed and position of the players and these are required to create tactics for the players.
The system needs these coordinates for the tracking algorithm to place the player positions according to the coordinates of the image. After the implementations, the system can analyse data statistics and tactics such as distance of the running players.
Description of visual information processing
The system will apply video motion tracking recording as a moving creativity that will transform any dynamic surface into a video display. The system will also apply particular software for masking and warping of the image in motion tracking to generate a perfect fit. The end outcome will be the installation of a dynamic motion tracking that will transcend common techniques of video motion tracking.
The system will apply evidence of the tracking mapping systems used for pooling. We shall use the Christie 3 chip DLP projector with its extensive manipulation of ambient lighting in water environments with no impact on the final ratio. The contrast ratio to be applied is 16.9 to provide us with positive results and reduced spread of projectors since we need minimal projectors in comparison with motion tracking for tennis players (Plass & Armin, 2006).
One of our presentations will feature the formulation of 360 degree mapping motion tracking of a tennis player. The video mapping technique will utilize lighting for the motion tracking of imagery of newspaper articles onto the tennis player body. This will be supplemented by the use Vauxhaul Adam Infinity sound system with its production of a 7 based channel sound with amplifier powering and sub woofers mounted in the boot.
In solving the possible challenges that we may meet we shall apply the techniques of electrosonicism for custom motion tracking. This will range from employment of miniature projectors in narrowed spaces to the most extensive screen systems normally designed as a replacement to extensive large formatting motion picture projectors (Mackenzie, 2009). The screens will be flat and multi-faceted as well as domes since we aim to exploit architectural motion tracking as well.
The frontal and rear motion tracking will be sorted out with increased specialist surfaces of water screens and Pepper's ghosts. This will allow us to use the electro sonic technique even with standardized equipment with customization of lens designs.
As far as an optimal perception of phenomenal consciousness is concerned, theories based on representationalism of the mind are based on models of the information processing paradigm. These are as much in correspondence to the neurobiological or functional theories, at this point we are confronted with several arguments on the basis of inversion or absent qualia.
Such considerations exhibit a preceding pattern based on the assumption of holding complete knowledge of the neural and functional states that are in subservience to the occurrence of the consciousness that is phenomenal. This can still be conceived as the neural states which are also defined as the states with similar casual responsibilities or with similar representational function (Gilbert & Li, 2013).
These occur with no phenomenal content in any way or such states being accompanied by contents that are phenomenal with broad variation from the usual ones. In definition, visual data processing entails the visual cognitive skills that permit us the processing and interpretation of meaning from visualized information that we attain through eye sight. Therefore, visual perception plays are vital role in aspects of cognitive and intelligence skills such as spelling, math and reading (Plass & Armin, 2006). On the other hand, visual perceptual deficits can lead to challenges in learning, recognition and remembrance of letters, wording, and confusion of likeness as well as minor variations in addition to differentiating the main ideal from the details of insignificance.
Visual perceptual processing can be sub segmented into the categories that comprise of visual discrimination, figure grounding, closure, memory, sequential memorization, constancy, spatial relations as well as visual motor integration. Note should be taken of perception as active procedures of location and information extraction form the setting while learning entails the procedures of acquisition of information through experiences of information storage (Mackenzie, 2009). In which case, thought is the manipulative stance upon information for solving challenges.
Such that it is eased to extract information (perception) which creates an ease in thought procedures becoming. In overall it is accepted that human vision takes the form of extreme powerful processing of information towards facilitation of the interaction of the world that surrounds us. However, even in the face of extended and extensive efforts of research encompassing multiple fields of exploration, the fundamentals that underlay as well as operational principles of visual information procedures remain largely unknown (Nassi & Callaway. 2009).
We are still not able to ascertain the origin and distance along the route from eyes to the sensory input area known as the cortex. It is in this area that the conversion into object meaningful representation is undertaken under conscious manipulation of the brain. Nearly half of the human brain in the cerebral cortex region is charged with the processes of visual information although even with extended and extensive research efforts that are encompassed a conundrum still persists. Present theories on visual information processing are held in the consideration of human visual information processing being interplay of the two inversely directed procedural streams (Gilbert & Li, 2013).
This is taking the form of a non-supervised, top down directed procedures that convey the regulations and guiding knowledge as a guide to linkage and binding of disjointed pieces of information to meaningful perceptual object images. Most important in the idea of such a proposition not completely being new as in past research, there have been presentations in the form of depictions of the "faculty of appreciation" as a synthesized relation of "two constituents" which include the raw sensory data with the other being the cognitive "faculty of reason."
Conditions that impair visual information processing
Past research has presented a demonstration of alcohol and dyslexia being among the aspects that impairs processing of information, although it remains unknown whether such impairment is on all the levels of information processing or in the onset states instead of the later stages. Those faced with the condition of dyslexia suffer from deficiencies of attention that are impairment to the capability of selective procedures of visual information that is incoming. The early levels of information processing are held in the description of being those that entail the detection as well as response of simplified stimuli (Nassi & Callaway. 2009).
An assignment on the assessment of such function is the inspection time that has previously been demonstrated to entail sensitivity to pharmacological agents. This is as well as being the most reliable and validated within the cultural fairness of information processing measures of cognitive ability. Past assessment findings have also presented the impact of nicotine on information procedures as being held in the overall regard in the form of a measure of speed within the early levels of information processing. These include the speed of visual encoding that comprises of the ability of making observations or inspections on sensory input on which the discrimination of relative magnitude rests.
This is in contrast to assignments such as reaction time which is summarization entails the involvement of increased response oriented measures of complete decision making time that comprise of total information processing. Although, there is no research of examination of the impacts administration of alcohol in a similar response, there are limited studies based on the examination of the impacts of alcohol in the early stages of information processing with utilization of other assignments. With the application of visual tracking assignments, it was ascertained that the speed of detection experienced impairment from alcohol that that these impacts where greater in dual task settings with comparison to single task settings. Such outcomes have been held in the description of being the deleterious impacts of alcohol on the centralized processing capacity and on information processing availability on the capacity of information processing with time.
Further investigations of early information processing are based on the examination of the mismatched negative component of auditory event relation potential as well as reports of reduced dosage of alcohol attenuation of the event relation potential signal (Gilbert & Li, 2013). In this case, the mismatched negative component suppression was solid within stimuli deviation as reduced which the indication of relatively reduced blood alcohol concentration is.
The detection of minimal deviations for instance that needed in the course of the inspection time assignment more so in case of hampering in which case similar outcomes have been discovered in simplified reaction time assignments with double level of intensified stimuli. These studies produced outcomes of an increase in response time as well as the impairment of stimuli detection which is a suggestion of the influence on sensory perceptual procedures and the measure of attentiveness.
Current trends in the research of visual information processing and the advancement in understanding of visual information processing
Current discoveries in the arena of visual information processing are based on the reflection of the elementary principles of vision as well as the utilization of visual information based on cognitive attributes. This is based on the notion of such work leading to the verge of development based on the grounds of optimism within the several computational theories of sophistication that incorporate data that is neurobiological and behavioural (Nassi & Callaway. 2009).
These theories entail the flourishing of the skilful exploitation of the neuro-imaging and computation of simulative technologies, these permits answering of questions that are subtle regarding the component subsystems within vision. It also entails the objective of on the basis of previous progression and future opportunities; this is augmented by the emphasis of human vision and the need to lower the domain of exhaustive coverage of strategies of research.
The Strategy
The algorithm of the tracking system can be categorized broadly into two main approaches these are Track before Detection (TBD and Track after Detection (TAD) type of approach type of approach.
At the Track after Detection technique the tennis players are placed to be detected in each frame. After this stride, just the players are utilized for the following; whatever remains of the picture is disposed of. In such a methodology, once the players are identified, the following issue includes two components: information affiliation and estimation. The Track after Detection approach is suitable for tracking of the fast and small objects that are in action with demonstrations. The characteristic samples can be established in aerospace/defence systems. A tennis player is a fast objects in action and the Track after Detection approach is more appropriate for tracking tennis players. One of the most important parts is to adopting the Track-After-Detection approach to the tennis player tracking problem. Each player is carries out as two dimensional points in the image sequence without resolvable features. We then try solving a pure data association problem using only the positions of these players. Another decision that could be made about the overall strategy is to make the tracker as generic as possible. It will then be beneficial to track the player.
Introduction to Object Detection and Tracking
Players are described by their striking elements, for example, shading, shape, composition, or different characteristics. At that point the issue is telling whether a picture contains as characterized article and, assuming this is the case, and afterward demonstrating its position in an image. The video sequence is supposed to be processed so that the task can be tracked by the size and position of the players. This involves high amount of correlation between consecutive frames in the sequence and this is carried in this event. When the players disappear from the video sequence the detection task will start the system again.
The detection of the players could be viewed as a classification problem and this involves sending task information to tell the presence or absence of a specific player in an image. If the system displays this information then the position of the players should be provided. Classification within the detected players is usually stated separately. For instance of object detection in images includes detection of the player body and cars and road signs in traffic scenes. On the other hand, the recognition takes place when the system detects a particular person or a road sign. The detection of the players sometimes depends on classification the important sections are feature to define an object. The primary objective in the system is to pick appropriate features that are highly discriminative and this will allow an accurate response of a classifier.
Concept of object tracking
To break down a tennis video at a higher semantic level, it is important to know where the court is, the place the players are, and additionally the relative position between the court and players. Object following is a critical errand with the field of Computer vision. It purposes at area a moving object or a few ones in time utilizing the video sequence. An algorithm examination of the video edges and yields the area of moving focuses inside the video sequence. The main task is to find and follow a moving object or targets in image sequences. The creation of powerful PCs and the expanding requirement for mechanized video investigation have produced a gigantic arrangement of enthusiasm for visual object tracking algorithms. The utilization of visual object following is applicable in the errands of mechanized checking of movement, vehicle route, and human-PC connection. Computerized video perception manages constant perception of individuals or items in occupied or limited a domain which prompts following and action examination of the subjects in the field of perspective. There are 3 key sections in video observation: detection of the aimed objects that are in action, tracking the objects from frame to frame, and analyse those object tracks to identify their behaviour.
Object tracking takes after the division step and is pretty much proportional to the acknowledgment venture in the image processing. Identification of moving objects in video streams is the primary significant stride of data extraction in the greater part of the Computer vision applications. There are mostly three methodologies in visual object tracking. Feature based techniques goes for extracting characteristics, for example, focuses, line fragments from image sequence; following stage is then guaranteed by a coordinating method at each moment. Differential techniques depend on the optical flow calculation, for occurrence on the evident movement in image sequences, under some regularization suspicions. The second rate class utilizes the connection to quantify between time age removals. Choice of a specific approach to a great extent relies on upon the area of the issue.
The accessibility in video innovation has expanded in the late years and it's enlivened by a lot of work on object tracking in video sequences. Numerous investigates have attempted different methodologies for object tracking. Nature of the method utilized to a great extent relies on upon the application area. A portion of the exploration work done in the field of visual object tracking that includes piece coordinating strategy for object tracking in movement scenes.
Object tracking
To discover the way of a moving object in consequent edges from a video taken progressively is very testing. Tracking an object consists of double elementary measures that are named according to representing as well as location. The latter is dependent on the target object model while the latter deals with the method of the search target in subsequent frames. The histogram colours are very famous for some models that were carried out for target representation. The location of the target relies on the models that are traced by presenting objective step. The tracking system that is carried out on the contour of the object will receive good consequences in the form of non-rigid objects.
If the system tracking performance is considerably good the system will require a large number of particles. If the system requires tracking more than one object at the same time it will fail to carry out tracking because of its high computational requirements of this technique.
One of the other methods which track based on the contour of a matrix level and two linked lists. The linked list elements are turned on to perform the realization of the adaptation of the contour. Although the algorithm as mentioned above has relatively low complexity in the calculation, we observe that the representation of the contour that is non-parametric tend to reduce the restriction in the contour.
If the background consist same colour on the target object the system includes background contour that expands which may leads to errors in tracking. If the rich textures are present then we can use the feature points that give very good results. One of the methods commonly used to identify a feature point that gives good results is the iterative Newton-Raphson minimization algorithm. The algorithm seeks the feature in correspondence of the subsequent feature points. in the subsequent frames. When pointed to, these are utilization of C/C++ optimization; to trace execution is faster and proved to be more reliable. But this track will fail if partial occlusion occurs, or if the object movement includes many laps.
It has been shown that the objective representation of the colour histogram is then used for tracking performs well even if it is partial occlusion or modifications of the object.
One of the very popular techniques called "Camshift" consists of a colour histogram of the tracking target object in the subsequent frames. It uses an iterative procedure that makes use of the average offset for finding the boundary of the object in subsequent frames. In each frame, changes the border position until it finally converges.
This technique has been developed primarily to track faces; but this can be used for other purposes as well. Another technique, known as Kernel-based tracking algorithm (CBT) shows the extraction of the colour of the object using a Kernel weighted colour histogram. Here are the weights of the pixels on the border of the object while the smaller picture elements around the object centre given higher weights. [
Although this is the target location is made by performing the average displacement method (generally Camshift) iteratively. These algorithms use the average displacement has proven to work very well with the core weighted histogram. However, a serious short-coming of these algorithms is that when the object to be detected undergoes a change in size, performance deteriorates.
Tracking an object based on its colour
When the systems is tracking object according to its colour is created on the following expectation colour of the object that is different from the background which are unique in the tracking as a whole. For instance HSV, Hue and value can be considered for the tracking system. In HSV, each "hue" colour is represented by a number called Hue. Quantity of colour is described by a number called saturation and the lighting of the colour provided by a number called Value.
This has a particular benefit over RGB scale that has a single number (hue) of the object in spite of several shades of colour (from dark to light) that the object may have. To be able to design this technique open source computer vision library (OpenCV) can be used. This library utilizes C/C++ optimization as well as actual time application. The system will utilize version is Open CV 2.4.6.0 configuration within Microsoft Visual Studio.
Player Detection and Tracking Algorithm
Keeping in mind the end goal to track the players over the sequence of image, the underlying areas of the two tennis players in the image must be found. A well-known moving object segmentation method can be used for this part of the analysis. This segmentation is called change detection which is used when the background is first constructed. After that the foreground objects area articles are found by contrasting the background frame and the present video frame. Background development is a key issue, as the execution of the change detection system relies on upon the nature of the background. In much tennis video analysis, a typical frame containing the playing field for the most part incorporates three areas and those are: playing field, the surrounded area of the playing field and the area for audience. For the most part, the moving zone of the players is constrained to the field inside the court. Moreover, the colour of the playing field is uniform, as is additionally the case for the surrounded area.
There are two points, when contrasted with the conventional algorithms. A background image with better quality will be extracted and after that, three-dimensional data will be considered when contrasted with the traditional segmentation utilizing both three-dimensional and temporal data.
Object detection technique is asked for to automatically segment each object so that there can be an exceptional tracking connected with the object. It incorporates five stages: object detection and background estimation, background subtraction, background updating and object detection. In this proposition the issues that happens takes after:
1. Extracting the background image from sequence of images and update the background automatically
2. Choose an appropriate filter to remove strange moving object in background subtraction image so that the system can be more robust.
4. Detection of the images from the binary background subtraction image. This project involves discussion on target tracking method based on Kalman filter.
The strategy makes a full utilization of the expectation capacity of Kalman filter to anticipate the region where the next frame perhaps shows up, then carries on the correlation match operation in the small figure area, finds the best correlation match target and makes the target tracking all the more initiatively
Video Processing, Object Detection and Tracking
A video is a gathering of static images or frames and related audio information. A frame is a solitary picture or still shot that is appeared as a major aspect of a bigger video or film. Object detection in video recordings includes confirming the nearness of the object in image sequences and perhaps finding it accurately for recognition. Object detection is firmly identified with another errand in Computer vision called tracking of the objects .Object tracking deals with monitoring temporal changes and spatial of an object in a video sequence. This can be finished by extricating the temporal correspondence issue managing matching target district in progressive edge of a successive of frames taken at firmly dispersed time interims. The errand begins with detection of the objects, while detecting the object over and again in subsequent frames in a sequence is regularly important to help and confirm tracking.
Moving object detection
To be able to carry out subsequent classification and object detection the recognition of the objects are fundamental parts of the target in the field of computer vision. When the system carry out the primary task of moving object detection the video sequence must be analysed and this will allow the detection of the object in the camera which in motion with respect to a background scene. The background is expected to be static for the stationary camera.
The bullet points lists the techniques that carry out object detection using three different kinds of algorithms and those are:
• Optical flow
• Temporal differencing
• Background subtraction
The technique of the Background subtraction has two important sections and those are:
• The updating and generation of a reference background image
• Carry out appropriate subtraction between the current image and the background model.
Initialize Tracks
The introduce tracks capacity makes an array of tracks, where every track is a structure speaking to a moving object in the video. The reason for the structure is to keep up the condition of tracked object. The state comprises of data utilized for detection to track task, track end, and show. Thus, the illustration just shows an object after it was tracked or some number of frames. This happens when complete noticeable predefined limit of threshold. At the point when no detections are connected with a track for few frames to example expect that the object has left the field of perspective and erases the track. This happens when continuous imperceptible count exceeds a specified threshold. A track may likewise get erased as noise if it was tracked for a brief timeframe, and checked undetectable for the greater part of the frames.
Background Subtraction
According to Shaikhk (2014) The Background subtraction is the most vital part of the basic methodology for motion detection. The thought is to subtract the present image from a reference background image, which is upgraded amid a timeframe. The subtraction leaves non-stationary or new objects, which incorporate whole entire silhouette area of an object. This methodology is straightforward and computationally moderate for real time systems, yet is to a great degree delicate to dynamic scene changes from lighting and incidental occasions.
Background subtraction has been used as a technique for motion segmentation in static scenes. It attempt to detect moving regions by subtracting the current image pixel-by-pixel from a reference background image. The creation of the background image is known as background modelling for occurrence averaging pictures after some time in an instatement period. The reference foundation is upgraded with new pictures after some time to adjust to element scene changes occurrence averaging pictures after some time in an instatement period. The reference foundation is upgraded with new pictures after some time to adjust to element scene changes.
Background subtraction object recognition can be accomplished by building a representation of the scene called the background model, and after that the choice strays from the model for every approaching frame. Any vital change in an image area from the background model implies a moving object. While tracking one individual in a stationary background might be generally basic, the issue turns out to be extremely convoluted with numerous individuals. They might cross before each other, behind impediments, through various lighting, with shadows, and in gatherings.
The foreground pixel map creation is trailed by morphological closing and the disposal of small estimated areas. In spite of the fact that background subtraction methods perform well at extricating a large portion of the pertinent pixels of moving areas even they stop, for occurrence, stationary objects reveal the background, for example, stopped car moves out of the parking garage or sudden brightening changes happen.
Object Segmentation
The idea of the segmentation is basically changing detection. The object s that is in action is divided from additional section of the scene by the information of the motion. This method constructs and maintains up-to-date background information from the video sequence and compares every frame by comparing with the background. It is assumed to be in object area when the pixel is expressively different from the background. This technique will not be attempting to gather the shape of the object information from the area that is changing frequently in the scene because the characteristics of the changing parts are very unpredictable. This will depend on the texture, object motion and contrast information which will not be gathered in advance. The focus in here is on the stationary section. The assumption of the system technique is the motionless background. In most video conferencing and remote surveillance applications, the camera is fixed. The produced technique splits into five sections. The first section is to estimate the frame difference mask by threshold the modification between two successive input frames. The second section is to carry out pixels which are not moving for long time are considered as reliable background by frame difference mask in the background action step. This step retains an up-to-date background buffer as well as a background action mask that indicates if the background data of pixels is available or not. At the third step, the difference mask of the background is produced by comparing the existing input image and the background image that are stored in the background buffer. The difference of the background mask is the principal information for the shape of the object that is created. At the fourth section, the initial mask of the object is assembled from the frame difference mask and background difference mask. When the records of the background mask involve information from the background pixel when available, the object mask is used initially for the background difference mask. In this case the value in the frame difference mask is copied to the object mask. The initial object mask is created in the fourth step and this has noise regions because of the unbalanced camera noise and object motion. Also, the boundary area may not be very smooth. In the last step, the areas which have noises are removed and the initial object mask is filtered to obtain the final object mask.
Existing Tracking Methods
Object tracking applications
The tracking technique requires the detection of the objects mechanism during the appearance of the object at the video. The considered technique for detection of the object is to carry out the information in a single frame. However, some of the detection of the object techniques carries out temporal information computer from the sequence of the frames to reduce error detections. This temporal information is generally splits by frame differencing, which displays the changing areas in successive frames. Given the object area in the video sequence, it is then the trackers task to perform object correspondence from one frame to the next to generate the tracks. The important applications involve automated video observation and In these applications computer vision systems are created to display the actions in the region, for instance these areas can be car parks and shopping centres, in this case the system will identify the objects that are in action and it will report any uncertain condition. The system must be differentiating between natural entities and humans, which involve creating and excellent visual object tracking system.
Traffic monitoring: In some countries highway traffic is continuously monitored using cameras. If there is a vehicle that breaks the traffic rules or is involved in other illegal act can be tracked down easily if the observation system is supported by an object tracking system.
The object tracking system is carried out by a wide range of fields in recent years, such as, video data compression, military affairs and video data compression. The motion of the detection and tracking object in real time image sequences is significant task in computer vision, mode identification and image processing.
Player tracking plans an important role of the sports analysis. It is still a challenging implementation of the system for tracking people using video data is a difficult task. Tracking objects has many applications in fields such as sports and the tracking system developed in this project is aimed specifically at movement analysis in Tennis. The object tracking methods follows on the diagram below:
The tracking system in the project is created using blob based detection on a background subtracted image. Using 2D homograph, the detected image positions are transformed into real world coordinated to generate a map based overview of the movement paths.
Tracking mechanism requires an object detection mechanism when the object first appears in the video. Once the objects are represented using any of the mentioned models next step is to detect the object in the frame. This task is done when the object first appears in the video. The temporal information of the object in the first frame is extracted to detect it. Some models use more than one frame to extract the information; this is done by frame differencing.
Without knowing what to track wen cannot perform the tracking. Object representation gives the various methods by which the objects can be represented for instance ellipse, contour and point. Mainly objects can be represented by shape and appearances. Methods of object representation are point, primitive geometric shapes, object silhouette and contour, articulated shape modes and skeletal models.
First step of object tracing is the representation of the object of interest. The objects in the system s can be presented by their shape and appearance in this step, the system will primarily explain the shape of the object used for tracking. Then the system will employ and present the joint shape and appearance of the objects.
1. Points: The object in the system is provided by a point which is the centroid of the system that is presented by set of points. Providing this is appropriate for tracking objects that carry out small areas in an image sequence.
2. Primitive geometric shapes: The shape of the object presented by a rectangle and ellipse. The motion of the object when presenting generally modelled by translation, or projective homograph transformation. These are more appropriate for presenting simple rigid objects.
3. Object silhouette and contour: The boundary of a region is defined by the contour representation. The region inside the contour is called the silhouette of the object. These are suitable for tracking complex non rigid shapes.
4. Articulated shape models: The objects that are articulated carry out the collection of the human physique parts that are detained together with joints. For instance, the human body is an articulated object with legs, hands, head and feet joined by joints. The connection between the parts is carried out by kinematic motion models, such as joint angle. Articulated objects are represented by modelling the constituent parts using cylinders or ellipses
5. Skeletal models: The skeleton of the object can be distributed by applying medial axis transform to the object silhouette. The shape presentation model is used for recognizing objects.
Multi object tracking
Researching in multi object tracking has proved substantial progress in recent years. The current techniques can only accomplish some performance in comparably with only few targets. Most of the tracking techniques are successful but they are currently performing tracking by detection such as presenting the target by an object model that can be prepared for detection in each frame independently. The benefits of carrying out object detection indeed handle re-initialization if a target disappears, and this stops excessive model drift happening in the system. When carrying out single target, tracking amounts to fitting a single temporally consistent trajectory such that it optimally accounts for that evidence.
Multi player tracking technique is considered and carried out on estimating of the motion along with estimation and detection along with subtraction of the background and occlusion detection. The technique works professionally in the video sequences. The tracking system is carried out on the subtraction of the background and the video detection and tracking of the objects that are in action are presented in this project. Primarily the filters are carried out to aim the background of the image of the video to reduce noise of the sequence of video sequence. The subtraction of the background technique is considered to detect and track the objects that are in action. The testing results by OpenCV Library show that the subtraction of the background is beneficial in both detecting and tracking of objects in action. The system attempts to find objects that are in action by subtracting the background images from video sequences. The experiment outcomes display that the produced technique runs promptly, precisely and fits for the simultaneous detection.
To begin the number of targets the system has identify the positions of these targets and this may vary over time. When the detector output is only partly available one has to provide the reasons for missing evidence as well as incorrect evidence. Moreover, trajectories must not be certain constraints, such as those two targets which could be at the same position at the same instance.
The labelling detection of the system has a place with a firm target or being a false caution is characteristically in the mystery space. The watched scene of the same detection will just have a single name. Be that as it may, the areas of the objective after some time are for sure given in a persistent state space and this could likewise deliver more measurements, for example, speed and size.
The object that supposed be tracked can be named as target and the targets are provided by its model. This is given as single or numerous histograms of one or different measurements, contingent upon the dimensionality of discriminative components. Similarly, he candidate for the target has set of pixels which possibly match to the target model which is provided by its histograms.
Multiple Hypothesis Tracking (MHT)
The Multiple Hypothesis Tracking has so many frames which have been observed for the tracking to ne accurate. Multiple Hypothesis Tracking has is an iterative technique. The technique starts with set of current hypotheses for tracking. The prediction of objects positions are created for the frame is made successfully. The prediction of the positions is then matched by calculating the measure of the distance. The Multi Hypotheses Tracking technique has the potential to track multiple objects along with handling occlusions and calculation of the optimal solutions.
Mean Shift Method
According to Shaikhk (2014) The Mean-shift is a technique is created to feature space analysis. This technique is considered when the action of shifting data point is averaged by data points in its area clustering.
Mean Shift tracking techniques consider fixed colour extraction when the application is used. In relevant applications the colour extraction could change. The CAMSHIFT technique which stands for Continuous Adaptive Mean Shift can take charge of changing colour extraction by adapting the search window size and computing colour extraction in a search window.
The Mean shift is a nonparametric technique which traces the mode of extraction for instance this can be the location of the maximal probability density, as proposed by Fukunaga and Hostetler. For this reason the mean shift technique increases the gradient of the probability density function. The technique has been considered in computer vision it was considered for the tasks of clustering and objects tracking. In technique sensitive hashing method is used, which lets the significant reduction of computations. On the other hand, Comanicia has proposed the mean shift for the real time tracking of colour videos. The mean shift technique was proposed for detecting and tracking texts on the images. We consider the technique to be reformulated to operate in the domain of fuzzy membership fields, rather than probability, which generally lets more convenient definition and operations on tracked features.
Algorithmic Aspects of the Mean Shift Tracking
Accurate visual object tracking under constraint of low computational complexity presents a challenge. Tracking of visual objects can be done either by forward-tracking or by backtracking. The forward-tracking methodology evaluates the positions of the areas in the present frame utilizing the segmentation result from the previous image. The back tracking based methodology fragments foreground regions in the present image and after that sets up the correspondence of areas between the previous images. For creating correspondence, a few object templates are used. A conceivable forward tracking strategy is mean shift analysis. Mean shift technique was initially presented in 1975, this strategy has been represented by D.Fuiorea.
The mean shift technique is non parametric algorithm. The technique delivers localizations in an accurate way by efficient matching without extensive search. The technique carry out the process, the mean shift value for the current point position, and then it carries out the point to its mean shift value as the new location, and then it computes the mean shift when become successful to certain condition.
In mean shift tracking, the kernel speed is significance, since it decided the quantity of taking participating tests, as well as reflecting the scale of the tracking window. The traditional mean shift procedure was restricted by the altered kernel transmission capacity.
Layering based tracking
This other technique of based on kernel tracking that are considered when multiple objects that are tracked. Every layer contains the state of the movement, for example, interpretation and turn, and layer appearance, in view of force. Layering is expert while extricating the foundation movement such that the object movement can be unsurprising from the 2D parametric movement. Each pixels likelihood of figured in light of the articles prior movement and shape highlights. The strategy can possibly track various images and completely impediments of object.
Blob Analysis
Blob tracking might be simple and quick, yet it doesn't work typically, especially with individuals moving in gatherings. Individuals originating from markets or other comparable shops regularly push trolleys in front of them. These trolleys are regularly of an indistinguishable size as a person. This makes part blobs by stature or width a hard undertaking, as it must consider development; it is difficult to check whether the blob is genuine or gathered from numerous individuals. It gets to be difficult to distinguish that in occupied territories. Still, enormous territory blobs are relied upon to be gatherings of either individuals or something else that are moving. Individuals remaining in gatherings are hard to smaller with, notwithstanding for the human eye. When they stand close commonly or clasp hands, they show up as one major blob. Blob identification (blob examination) includes looking at the blob image and finding every individual as a foreground object. This may, in some specific cases, turn into a fairly troublesome errand, notwithstanding for the human eye. At the point when the moving persons have practically identical hues as the background, the blobs turn out to be fairly vague, and abnormal circumstances show up when blobs contain gaps. Enlargement is insufficient for this situation to totally reduced the blob.
Blob Detection
The detection of the blob technique is designed at detection areas in a video sequence that varies in properties, these properties including brightness or colour matched to surrounding area. A blob is a region of an image in which in some properties are constant or approximately constant.
There are two main classes of blob detectors one is differential methods and the other one is methods based on local extrema, which are based on finding the local maxima and minima of the function. They are also referred to region operators such as interest point detection and corner detection.
Interest point detection algorithm is a detection of interest points for subsequent processing. An interest point is a point in the image which has a well-defined position in image space.
Corner Detection algorithm is used for extracting certain kinds of features and infers the contents of an image. It's appropriate to use in video tracking and object recognition. Corner detection overlaps with the topic of interest point detection
Blob Tracking
Blob tracking provides a way to track collections of pixels from one image to the next. Commonly as part of any object tracking it is necessary to identify the objects when comparing the current frame and the last frame. The blob tracking will label each blob with a specific id that will be attached to the same or similar blob in the next frame. Blobs are defined by similarity in the two images depending on how it's being configured by blob tracking.
I have three blobs that I am tracking above and the blue blob predicts the next position of the players. I use prediction algorithm to predict these positions to get accurate results. These blobs are used to predict the next position of the players.
For each of the current frame blobs with the blue circles have contours with specific convex halls in the system to draw the distance of the predicted position of the player. The first blue circle is the current frame the blue circle will be computed to get the distance between the dashed circle and the second this will determine if the distances of those three blue circles are long enough to predict the next position of the players.
Silhouette Tracking
The Silhouette tracking is considered by taking the object zone in each edge. Silhouette tracking methods use the information encoded inside the object zone. This information can be as appearance thickness and shape models which are all the more frequently as edge maps. Given the object models, silhouettes are tracked by either shape coordinating or contour development. These methods can essentially be considered as object segmentation associated in the temporal domain area using the priors created from the past frames.
The objects can have different types of shapes such as head, hand, shoulders and legs that can't be all around portrayed by straightforward geometric shapes. Silhouette based techniques give an exact shape portrayal to these objects. The objective of the silhouette based object tracker is to discover the object area in every frame by method for an object model created for utilizing the past frames. This model can be as a colour histogram, object edges or the object contour. We separate silhouette trackers into two classes, in particular, shape matching and contour tracking. Shape matching methodologies look for the object silhouette in the frame.
Shape Matching
The Shape matching can be performed like tracking based on template matching where an object silhouette and its related model are sought in the frame. The search is performed by figuring the similarity of the object with the model created from the estimated object silhouette based on past frames. In this approach, the silhouette is accepted to just decipher from the present frame to the following frame, along these lines non rigid object motion is not unequivocally took care of. The object model, which is for the most part as an edge map, is reinitialized to handle appearance changes in each frae after the object is found
There is a technique which has the potential to match shapes by finding the equivalent detection of the silhouettes in two subsequent frames. The matching of the point is carried out for this part. The matching of the silhouettes creates the whole object in the region that is contrast to use points. The matching of the silhouettes creates objects appearance features, while point matching uses only motion and position based features. The detection of the silhouettes generally carried out by background subtraction. Once the object silhouettes are distributed, comparing is carried out by computing some distance between the object models associated with each silhouette.
Contour Tracking
Contour tracking techniques iteratively advance an underlying contour in the past frame to its new position in the present frame. Tracking by developing a contour can be performed utilizing two diverse methodologies. The principal approach utilizes state space models to show the contour shape and movement. The second approach specifically develops the contour by minimizing the contour vitality utilizing direct minimization strategies, for example, techniques such as gradient descent.
Point Tracking
In an image structure the objects that are in actions is considered by their feature points when tracking .Point tracking is an unpredictable issue especially in the rate of occlusions when detecting false objects. Detections should be possible generally straightforward by recognizing these items.
Tracking can be planned as the correspondence of detected objects spoke to by points crosswise over frames. Point correspondence is a complex issue particularly within the sight of occlusions, misdetections, entries and exit of objects. Point correspondence strategies can be isolated into two classifications, in particular namely deterministic and statistical techniques. The deterministic techniques use qualitative movement heuristics to oblige the correspondence issue. Then again, probabilistic strategies expressly consider the object estimation and take vulnerabilities to build up correspondence.
Estimating Positions by Kalman Filtering
The Kalman filter processes all available measurements, regardless of their precision, to estimate the current value of the variables of interest. It provides an estimate of the current parameters using current measurements and previous parameter estimates and they are constructed on Optimal Recursive Data Processing technique. The Kalman Filter carries out the restrictive probability density. Kalman filter is a set of mathematical equations that delivers competent computational results to calculate the state of a process in several parts and these parts follows:
• The techniques carry out calculations of past, present, and even future states, and it is capable of doing the same effort although the precise nature of the modelled system is unknown.
• The Kalman filter calculates the process by carrying out form of feedback control.
• The filter calculates the process state at some time and then gathers feedback in the form of noisy measurements.
The equations for Kalman filters split in two categories: the first one is updating the time of the equations and measurement and the second one is the responsibility for the times of the equations for projecting forward the current state and error covariance which estimates to gather the priori estimate for the next time step. Updating the measurements of the system is also responsible for the feedback.
Frame Difference
For this technique the images of the background are considered to move without any moving objects that carried out as reference image. The value of the pixel is carried out each colour channel of the background image during the subtraction from the following pixel value of the input image. If the consequence of the value is larger than a particular threshold value, then the foreground pixel will be considered as background. This technique is simple and easy to implement, but the consequences are not precise enough, because the changes are carried out in the background brightness which cause error.
Optical Flow
According to Shaikh (2014) Optical flow follows the pattern of the specious motion of objects, surfaces and edges in a visual scene. In this approach the technique was consider image optical flow field for estimating and clustering processing to be done according to the optical flow extraction characteristics of image. This technique takes the complete action information and detects the action object. This method is sensitive to noise, poor anti-noise performance.
The main issue within processing of image sequences is the measurement of optical flow or image velocity. The goal is to compute an approximation to the 2D motion field. Once computed, the calculations of image velocity considered to be wide variety of tasks ranging from passive scene interpretation to autonomous, active exploration.
Optical flow techniques make an action of flow vectors of the objects that are in action over time when detecting moving area in an image. In this technique, the obvious velocity and direction of every pixel in the frame has to be computed. The background motion mode, which serves to stabilize the image of the background plane, can be estimated using optic flow. Many optical flow techniques are computationally complex and cannot be used in real time without specialized hardware.
Particle Filtering
The particle filter based object tracking technique considers the probability of extraction over the location and scale during tracking of the object/player. It is a technique for execution inference in state-space models where the state of system involves time via noisy measurements. It creates all the models for specific variables when beginning to move to the next variable. This technique has benefits when the variables are created which allows for new operation of resampling. However there is a limit when using Kalman filter state variables are that extracted. The Kalman filter has estimated a state variable which makes the technique poor these state variables does not do Gaussian extraction. This constraint could be incredulous by using particle filtering. This technique generally uses contours, colour features or texture mapping.
Simple Template Matching
The Simple Template matching technique considers the area of interaction in the video sequence. In this technique, the relevant image is confirmed with the frame that is separated from the video sequence. Tracking can be carried out on a particular object in the video sequence and the overlying of object is carried out separately. Template Matching is a technique for processing digital images to find small parts of an image that matches, or equivalent model with an image in each frame. The matching is carried out when the image template contained for all possible positions in the source image and this estimates a numerical index that specifies how appropriate the model fits the image position. The technique could be capable when challenging with tracking particular image and partial occlusion of object.
KLT and TLD
According to Shaikh (2014) the Kanade-Lucas-Tomasi (KLT) feature tracker is a feature distribute technique which is carried out on the initial work of Lucas and Kanade on an iterative image extraction technique that carries out the spatial intensity gradients to monitor the search towards the accurate match.
TLD is a specific technique for real time tracking of unknown objects in video sequences. The aimed object is specified by a bounding box in a single frame. TLD concurrently tracks the object and while tracking learn its appearance to detect the object whenever it appears in the video. When the tracking system is in actions to track objects with highly cluttered scenes will make the system challenging. Tracking develops to be challenging under the task of the following agile moving objects, in the presence of dense background clutter, probabilistic algorithms are essential. Techniques based on Kalman filter were restricted in the range of probability extractions they present.
Simple algorithm for tracking
Ground-Truth Data Collection
According to the Bogulow Cyanek (2013) John Wiley & Sons the Ground-truth data lets substantiation performance of the machine learning techniques. The procedure of its procurement is testing and tedious, due to the great necessities of this kind of information. Accomplishing the ground-truth information can be helped by an application worked for this reason. Ground truth information permits distinctive methods of point choice, for example, singular point positions, and in addition rectangle and polynomial layouts of obvious items. The Figure below describes the operation for points marked inside the frame of the road sign is. During the Ground Truth Data collection the positions of the points are considered to take as meta-data to the original image. These data can be extracted as image features; in this case it is colour in the chosen colour space. This data was considered to collect point samples for the pixel-based classification on the body of the human skin to selection and road sign recognition.
Figure 1 Perspective of the application for yearly point stamping in images. Just the positions of the chose focuses are spared as meta-information to the first image. These can be utilized to get image components, for example, shading, in the showed places.
Related work
Tracking players considered as a huge research idea in computer vision and it had huge interests on the fields for so many years. In this project we consider to concentrate on the latest advances in player tracking with multi player tracking.
The Multi player tracking techniques are separated into two sections. The first section is considered on the information from previous frames to calculate the current state recursively. The early Kalman filtering technique only had model linear target motion and if we look at the more advanced examples based on filters such as particle filtering, it will be considered to be more challenging. However if we are talking about the amount of particles that are required to make it more accurately, the posterior in challenging circumstances develop quickly and it is hard to hand in practice. The second sections for a certain latency and globally solves for all trajectories within a given time window. In this case, it is common practice to restrict the optimization to a finite state space.
Challenges
Although tracking object systems were researched for so many years, the detection and tracking of the objects are still considered to be an open research issue. Accurate and high amount of performance technique are still considered to be a challenging. The challenging task in this project highly depends on how you describe the object to be detected and tracked in the system. If there were only few visual features, such as a specific colour, it can be used as demonstration of an object; it is quite easy to detect all pixels with similar colour as the object. The other challenging task detects the body of the player accurately and tracked it appropriately. Many of the difficult tasks are coming from the image inconsistency of the video sequence because video sequence objects generally are in actions. When the objects are in action through the field of view of a camera, the images of the object may change intensely. This inconsistency comes from three principle sources: variation in target pose or target deformations, variation in illumination, and partial or full occlusion of the target.
There are two sources of information in video sequence to be considered about the detection and tracking of the players those features includes colour, shape and texture and motion information. Collection of the statistical analysis of visual features and temporal motion information leads to more robust techniques. A good methodology will be segmenting the frames into regions based colour and texture information and then merge regions with similar motion vectors subject to certain constraints such as adjacency. Huge number of techniques has been introduced in literature. All these techniques rely on different research areas because each of them deals with one feature of detection and tracking of the player which may lead to be a problem or a specific scenario. Many of the techniques are been used by and there are collections and intersections among different methods. All these techniques create very challenging uniform classification of existing techniques. In the paragraph below we are reviewing the techniques that are considered separately in association with different research highlights.
The objective of tracking a player is to create trajectories of the player over time by finding the position in every frame on the video sequence. The tracking of the player could also produce the complete area in the video sequence that employed by the player on every instance. The requirements of detection of the player are developed correspondence between the player instances on all frames and this can either be performed separately or jointly. The player region and correspondence is jointly calculated by developing the player location and region information previous frames. In the tracking technique the players are presented using the appearance and shape. If the players are presented as assign, then the translational model will be considered. In the case where geometric shape representation like an ellipse is used for the player. Presenting these could estimate the motion rigid scene in the scene. For a nonridged object, silhouette or contour is the most descriptive representation and both parametric and nonparametric models can be used to specify their motion.
The tracking systems are considered to be sectioned in to two groups. The first group is if there are generic trackers which are considered to be minimum amount of priori information along with the mean-shift technique and the colour based particle filter. The second group is if there are trackers that use a very specific model of the object.
The players that are detected in video tracking systems are often being tracked in challenging environments characterised by the variable visibility for instance shadows, occlusions and the presence of spurious objects and background. The consequences of these are visual object tracking still affected from a lack of robustness due to temporary occlusions due to the objects crossing, changing lighting conditions. The main difficult task in video tracking systems are target locations in consecutive video frames, specifically when the objects/players that are in action by moving fast relative to the frame rate. Video tracking systems generally consider motion model which explains how the image of the target could change for specific motions of the player to track.
Several techniques has been introduced and implemented to solve the challenging tasks that arise from the video tracking process, these tasks includes mean shift and optical flow techniques. The role of the tracking techniques is to analyse the video frames to be able to calculate the motion parameters. These parameters characterize the location of the target.
Synthesis
Design
User Interface Design
The initial design phase began by analysing the system requirements and researching existing methods and algorithms for the Tennis player tracking system. A good understanding of what the system needed to achieve was gathered along with the existing packages for the tracking system. This will allow the system to be easy to use as well as being visually running appropriately.
The user interface design shows the initial User Interface design, presenting how the system will flow as well as a general idea of the system as well as visual design. As the system developed, a number of amendments were made to allow the functionality of the system to track each frame appropriately.
The User Interface wireframes shows the finalised designs used by the system. The Figure 1 below shows how the system views are linked together which allows the user to start the system through the expected functionality. From the "Load video frame "section the system allows capturing of the video start by loading the video to prepare for the tracking. The description of the functionality of each section can be found below.
Page Functions
Load Video Frame This method loads the video frame to prepare for the system to be tracked
Calculate each pixel between frames This method calculates the pixels of the images between frames.
Detect the players This method detects both players by bob detection algorithm
Find connected components of the players This method finds the connected components of the players and determine which components belongs to them
Show images of the players This method draws rectangular boxes around the players and shows the images
Track Players This method tracks the players when they are moving by blob tracking method.
Figure 2
High quality video sequence
To make sure that the system is running smoothly and comfortable the video files will be in high quality. It will have negative impact on the system if it's not high quality, the system could have problems detecting the player. By using consistent video quality it will enable the system to detect and track the players easily with accuracy.
System Development principles
To be able to complete the system development successfully the product requirements, must be in the correct development model and the language of the program should be decided before the beginning of the project. The programming language for this project will be written in C++ using the OpenCV Library to track the players on Microsoft Visual studio 2015.
Microsoft Visual studio supports variety of programming languages such C++/ Java, C#. The OpenCV library which stands for Open source computer vision library is an open source computer vision and machine learning software library. Each of these programming languages has its own benefits the C++ was preferred due to the fact that it's capable of tracking the objects within the relevant software. When the C++ language was chosen, the programming design was chosen to ensure quality of the design to be consistent through to the end.
When C++ language was chosen it enabled the programming model to ensure the quality to be consistent throughout. OpenCV is one of the most common libraries used for tracking objects because it has lots of basic inbuilt image processing functions for developing applications.
Methodology
Algorithm
The complete algorithm description can be found in the overview of the steps followed are mentioned in an easy way below:
1. The first frame is read from the webcam and the colour of the object to be tracked has found colour Space HSV (Hue-Saturation-Value) because the hue value depends on only the hue and is the same ranging from dark to light.
2. When the image is the threshold for the use of the HSV value.
3. Calculate since moment10 and moment01- the two first-order moments and zero-order moments (area).
4. Moment10 then dividing by the area to obtain the X coordinate, and the like, to share moment01 in the art to obtain the Y coordinate.
5. In the middle of the object therefore be found using instant saved and then the next frame is captured, and the steps are repeated for that too.
6. In the current context in the middle of the object is then combined with the above one, which gives us a doodle that represents the motion path.
This method was implemented in OpenCV 2.4.6.1 and the obtained results proved successful in real time. This technique is carried out to track the object although the background is not continuous, there should only limitation is its colour be distinctive. Occlusion of the object has no effect on performance. Even if the object is completely covered some of the frames that hold up the tracking when the occlusion is removed. This is such that the movement of the camera while the video has little effect on tracking performance. It should be kept in mind that the colour of the object to be tracked must be unique compared to its background.
Object tracking with the use of contours
The optimization strategy
A sketch is a set of points that directly or indirectly portrays an image, a curve. This rendering depends on the situation in particular. There are many methods in which contour curve representation. Within OpenCV representation of contours is performed by sequences where the entrance is coded information about the location of the next point on the curve. Finding pixel using a set of pictures / frames, add all the pixel values of frames and divide it by the total number of frames. However, an alternative is often called the running average.
System flow chart
The running average can be calculated as: If the value given that it is constant, the results of the summary and the running average can be different. This parameter determines the extent to which the previous frame affects the accumulator. Corresponding gives the time it takes for the effect of past frames to decrease.
The details of tracking objects with contours can be found in the overview of the algorithm used are mentioned below:
1. The first frame is read from the standard PETS2001 dataset video.
2. Using the next frames the running average is calculated and accumulated. For each frame, the accumulated average value is subtracted.
4. When the difference is converted to grayscale using standard formulas.
5. Grey image threshold to describe more pronounced.
6. Noise reduced by morphological surgical eroding and dilating contours are then calculated on the difference image.
8. For each contour line is determined and a rectangle drawn around it to show the object being tracked.
9. Steps 2-7 are repeated until the completion of the video.
Background image
Quality of Service and Quality of Experience Layers
Varied solutions for Quality of Service have been presented within the many layers of the seven OSI models. These two layers in overall are applied in the Quality of Service as the application and network layering. The application layer comprises of the services that are availed to the application so as to attain the needed Quality of Service.
Furthermore, the application layer of the Quality of Service deals with parameters for instance the frame rate, resolution as well as motion tracking/audio codec's among others. This is while the network layer holds consideration of the parameters of delay and loss of packets among others. Quality of Service on the level of the Application Layer is based on human understanding, such as motion tracking that evolves on the two attributes such as spatial and temporal perception. In regard to coding of motion tracking, three mechanisms are applied to attain compression such as intra-frame as well as entropy coding mechanisms. Furthermore, the Quality of Service within the Network Layer is held in the classification of the two major types being prioritization and resource reservation.
Varied mechanisms and solutions can be utilized in the formation of the Quality of Service within the Network Layer in the form of Differentiated Services, Integrated Services as well as Multi Protocol Label Switching. In the schematic below, we are presented with the relation between the Quality of Service and Quality of Experience that are further segmented into three zones. At the point of the Quality of Service disturbance being below zone 1, the Quality of Experience is an increased value.
This is to say that the appreciation of the user is not impacted, while the reduction of the Quality of Experience is unidentified with a Quality of Service disturbance reaching the level of zone 2. The final aspect is that with an increase in the Quality of Service disturbance to zone 3, there may be a reduction of the Quality of Experience. This is to say that the appreciation of the user will be highly impacted and they can stop usage of the service altogether. Normally, with the Quality of Service disturbance parameter being raised, the Quality of Experience metric and perception of the quality are lowered.
System requirements and Specification
The list below has list of requirements for the software which must be followed. The software will function as expected with a high level of quality according to the standards.
• The system must run on a Microsoft visual Studio.
• The system must be written in C++/Java.
• The system must capture and run the video sequence.
• The system must extract the background image automatically.
• The system must be able detect the players.
• The system must predict the current and future locations of the players.
• The system must and draw contours on the tennis players.
• The system must track the players when they are moving.
Software Usage
Microsoft Visual Studio Microsoft Visual Studio is a (IDE) of windows Form and which can produce natural codes and managed codes. It supports different programming languages such as C++ which will allow us to edit codes and debug them.
OpenCV Library The OpenCV is library is carried out for visual tracking by compiling programming functions that are carried out in computer vision. This library mainly used for tracking but it also allow us to track statistical data's etc.
Visual Studio C++ Microsoft Visual C++is an unified development environment (IDE) featuring tools for developing and debugging C++ code this will allows us to write in C++ create and run the Tracking system.
Tools & Techniques
Implementation
Object detection and tracking is technique that required automatically to segment each player so that there can be a unique tracking linked with the object. This technique has five sections in the list below:
• Estimation of the background.
• Updating of the background.
• Background subtraction.
• Moving cast shadow elimination
• Object detection and tracking.
Loading video Frame
This section of the code captures the video of the tennis game. The system loads the video frame to make ready for the image subtraction.
Image Subtraction
This section is written to subtract the image to prepare for the current frame. It subtracts the image the background image on the system.
Current Frame Blob
This section shows the current frame blobs.
Declaring Colour
This section of the code declares the colours by using cv:;scalar.
Function Prototypes
Contours
The above code shows the current contours.
Current Frames
The above code shows current frame blob.
Difficulties
Object detection and tracking remains an open research problem although it has been in the research field for several years. The difficulty level of this problem highly depends on how one defines the object to be detected and tracked. There are a number of areas which could be further improved. Capturing the video frame is relatively simple. It is necessary that background model familiarizes to steady modification of the appearance on the location. For instance the outdoor settings, the light intensity typically varies during the day. Sudden illumination changes can also occur in the scene. This type of change occurs with sudden switching on.
One of the difficulties is the presenting the background clutter makes the task of segmentation difficult. It is difficult to model a background that reliably produces the clutter background and separates the moving foreground objects from that.
Some objects may poorly differ from the appearance of the background and this will make it difficult to make correct classification. This is especially important in surveillance applications. Camouflage is particularly a problem for temporal difference methods. Shadows cast by foreground objects often complicate further processing steps subsequent to background subtraction. Researches have proposed different methods for detection of shadows.
The quickness of the object that is in action shows significant effect on in its detection. If the tracking system objects are in action without moving fats, the temporal differencing technique will fail to detect the portions of the object in the constant region. However, the objects that are in action with faster pace will leave the object trail of ghost region behind in the detected foreground mask.
Detection of moving object becomes a very difficult task when videos are captured in challenging weather conditions such as fog, storm and snow. When these weather conditions appear it makes it difficult to detect and track the players.
Testing
Due to the agile system project development methodology, it makes it easy to process the systems requirements. It ensures each small section of the development to be accurate and perform as expected. Each of the these sections for code may perform unexpectedly under certain conditions, this will to system not performing as expected such as system crashes or tracking could perform incorrectly.
To be able to stop this, a number of key sections of the code will undertake component testing which means the functions of the codes will be tested by writing comments and additional code to ensure they can perform correctly as expected. In addition to component testing, the system must undertake different videos and user testing to ensure it is responsive and behaves as expected according to the requirements. This will be completed mainly the author of the project.
The initial positon is to test the detection and tracking of the players because these are core functionality of the system. There are several functions in places as discussed previously the common approach to gather image matching for alignment and mosaicking for matching players and the background subtraction take action from the video frame, when the system detects the players by blob detection and track it accurately by blob tracking the system will work as expected.
When the areas of the system are tested the system will run accurately the testing areas include image matching, detection and tracking components. The core functionality must be tested. Blob detection and tracking as well as image matching. A fully executed test results can be found below.
In the image below the screenshot was taken while the system was running. As we can see from the image the system tracks the player by drawing contours and follows the players.
The figure 1 shows the comparative results of Ground-truth data on the sequence below. As shown in the plot, the proposed method produces very competitive results. The proposed method performs the best as it almost does not contain any false detection due to background dynamics.
Figure1
The figure 2 shows the output masks of the two players. As you can see from the output mask the pixel intensities are thresholded or filtered. The accuracy of this approach is dependent on speed of movement of the players in the scene. Faster movements may require higher thresholds.
Figure 2
The figure 3 below shows the output mask of the court. As you can see from the court the method performs good data it detects the foreground regions as suggested by the output frames on Figure 1 above.
Evaluation
User Study
User Approach
In order to gather accurate statistics from the system, a user study will be performed to gather information in regards to how user study approach the system and determine how accurately they could create the same system. It was planned that there will 6 tennis match videos for the system to create appropriate detection and tracking algorithms to be able to run it as planned.
Comparison with existing algorithms
Particle and Kalman filters are recursive Bayesian estimators. Kalman filter is usually used for linear systems with Gaussian nose while Particle filter is sued for non-linear systems. Kalman filter uses system model and sensor observations to estimate current state from previous states.
Particle filter uses random sampling to generate different system states and then assign high weights to those sates that are supported by the sensors data. Particles with weight less than a pre-defied threshold are discarding and new random samples are generated to keep the number of particles constant.
Kalman filter are best for estimating linear systems with Gaussian noise. If I need to try modelling the location of the tennis player, it will give me a nice Gaussian solution. Kalman filters have much lower computational requirements than particle filters, but are less flexible.
If the system does not fit nicely into linear model the uncertainty does not look very Gaussian, A particle filter lets you handle almost any kind of model by discretizing the problem into individual particles each one of them are possible to state my model and a collection of a sufficiently large number of particles lets me handle any kind of probability and evidence distribution. If the evidence says that the player is in one of two crescent shape regions or the system behaves very differently in some regions than other, the particle filter is my estimator of choice.
There may be lots of particles and it gets exponentially worse as I have a model with more state variables. Once I've run out of particles in one area of the solution space, it's hard to get them back so the correct estimate may just drop out permanently unless there are a large number of particles.
It's difficult to allocate the correct amount of computation in advance, the only way is to know how many particles do I need with lots of data and simulation or test while Kalman filters will terminate and give good answer as song as my model fits.
Statistical Approaches
According to the Shaikh (2014) The Statistical characteristics of free pixels have been used to conquer the weaknesses of essential background subtraction strategies. These measurable strategies are principally roused by the background subtraction techniques regarding keeping and progressively upgrading insights of the pixels that have a place with the background image process. Foreground pixels are carried out by matching every pixels statistics with the background model. This technique is becoming attractive to be more famous due to its reliability scenes that contain noise, illumination changes and shadows.
The statistical methods produced by Stauffer and Grimson describe an adaptive background mixture model for tracking. In this technique, each pixel is distinctly modelled by a mixture of Gaussians which are restructured online by incoming image data. To be able to detect the pixel that goes into the foreground process, the Gaussian extractions of the mixture model for that pixel are evaluated.
The system carries out statistical background model during the creation of the pixels which are presented as minimum and maximum intensity figures and maximum intensity difference between any consecutive frames that carried out during the training period where the scene obtains objects that are in action.
Results
Total of 5 videos was tested for the system. The system tracks the players reasonably, when the players move faster than average the system starts the struggle to draw the contours in the right position. When the players are playing faster the system does not directly draws the contours within the players.
Video Frame Length Number of Frames
Tennis 00:47 In this video sequence, I believe that due to the quality of the video player at the top is not tracked, however the other player was tracked and there was 105 frames.
Tennis2 6:12 1995 Frames
Tennis3 6:30 2012 Frames
Tennis4 6:21 2030 Frames
Tennis5 7:42 2455 Frames
The outcomes of the colour segmentation of the object from a sequential arrangement of the live images with the utilization of the suggested approach, yielded satisfactory findings, The segmentation was also held in account for non while illumination with shadows and highlighting. The attained quality of the segmentation was much higher in comparison to the hue based segmentation on the assumption of the object colour being of a uniform hue.
Player candidate detection
Non-white illumination and specular surfacing caused an increase of the hue range for accommodation of the sectors of the colour spacing in representation of the highlight. For instance, the image below presents the red ball as well as the yellow plastic duck viewed by the camera with application of automated exposure of the indoor fluorescent illumination.
Player detection
White regions of the ball surface are held in the recognition of belonging to the object of identified in approximation to the dichromatic surface. These double spots are not held in the recognition of the ball with one top being very bright with some pixels nearly white. Such pixels are either inside the safe cylinder volume and as such were disregarded for being too extended from the dichromatic surface as attributed to its proximal approximation.
The other spot is in proximity to the plastic duck, such a spot is an outcome of inter-reflection. The yellow duck is a reflection of the yellowish light on the ball. This led the pixels to drift off from the dichromatic surface. At the point when the duck was removed, the spot was positively identified, in the similar turn, the black eye of the duck selected a certain reddish inter reflection from the ball as certain eye pixels where held in the recognition of being affiliated to the ball.
On the basis of the efforts made in formulation of an approach towards efficient colour object segmentation from a sequential arrangement of live image with application of real time applications as well as the novel look up tabular colour representation of dielectric objects on the model behaviour of colour clusters. This provided us with the performance in real time with account for non-white illumination, variable viewing settings and operational parameters of the camera as proposed. This calls for extended development on the basis of the approach that can yield in increased efficiency of colour representation for varied material and multi-coloured objects.
Coloured digital images have a tendency of comprising of red, green and blue values which particular measurements of 8 bits. For the formulation of a look up tabulated format for fast colour object segmentation of the RGB colour model was applied. Consideration was held of a cube comprising of zeros and ones with ones in representation of pixels affiliated to the object and identified within the dichromatic surface.
With colour resolution of the di saturated colours being in proximity to the grey diagonal region of the RGB cube as well as colours in proximity to the black and white corners of the cube, this was reduced to a safer volume as determined by the cylinder of the axis grey region being utilized.
This method was implemented in OpenCV 2.4.6.1 and the obtained results proved successful in real time. This technique is relatively strong in the sense that it tracks all the moving objects in the video. The Techniques that are considered such as monitoring the security is very effective as it detects any action in the captured view. This also works very well when the objects are in action and the background is similar in colour. The varying brightness, up to a certain level, has no effect on the performance of the algorithm.
Besides this, it also performs satisfactorily when the object undergoes partial occlusion. However, the background should be established for this algorithm to work otherwise contours may also contain background and the moving object cannot be traced in an effective way.
Previously examination of networks was based on the objective measurement of several criterions for the determination of the quality of the network. Such quantification was known as the network's Quality of Service with the term being made in reference to the capability of the network to attain increased deterministic behaviour. This includes allowing for data transportation with minimal loss of packets, delays and bandwidth maximization. Note should also be taken of Quality of Service not holding any consideration of the understanding of the user.
An augmented technique places consideration on the opinion of the user known as Quality of Experience. The later metric is subjective and comprises of the human dimensions as it integrates user understanding, expectations as well as application experience in addition to network performance.
It is on this premise that the adoption of an increasingly holistic perception of quality in the understanding by end users of the Quality of Experience is a growing area of research. When users experience a reduced quality of service, providers of this service are not able to afford to wait for complaints from the customers; instead such users will simply change service providers. Therefore, it is important that service providers hold means of continuity in the measurement of the Quality of Experience as well as the required improvement.
Several aspects can impact the perceived quality with the inclusion of reliability of the network, content preparation procedures as well as performance on a terminal scale. Furthermore, the Quality of Service regarding services of multimedia streaming over IP networks as it is the case today is based on the determinant of many parameters that are interdependent.
Certain parameters are able to be adjusted for instance the bandwidth and resolution of the image whereas others are not in the case of loss of rate in packet transmission as well as delay. Such missing parameters need to be held in consideration so as to raise the satisfaction of the end user. However, satisfaction of the user is not only impacted by the parameters of the Quality of Service but also the subjective aspects of Quality of Experience in the instance of user interest and expectation Several researchers have employed varied methodologies in accordance to the type of media such as voice, motion tracking as well as imagery. For the specific types of media, there are multiple methodologies of measurement that hold varied computational and operational needs.
Conclusion
Tennis Player tracking is a challenging subject in the area of computer vision. I have selected a fast and competent tracking inference technique to be able to carry out the requirements with multi target tracking that are associated accurately to target for this project. The tracking system involves number of theoretical restraints that contains, pattern recognition and image recognition.
Several trackers have developed systems which can track objects in real time in simple situations. The relevant techniques are presented for tennis sports analysis, including the detection of the player and tracking using a binary map function for two players. The system assists analysis of tennis sports video at high level.
In this review we investigated computer vision techniques for tracking tennis player in broadcast tennis videos. We argue that the Mean-shift type of approach is suitable for the tennis player tracking. The objective of the work presented in this review was to provide a tennis player tracking module for an automatic tennis video annotation system. This annotation system was briefly introduced, techniques related to this review, including Mean-shaft, Multiple Hypothesis tracking, Multiple object tracking by image subtraction and some existing player tracking algorithms, were reviewed.
Numerous ideas of the object tracking system and object detection were studied. Appropriate techniques for these parts were described in details. Player tracking can be achieved by using various techniques such as kalman filter, mean-shift and multiple objects tracking by image subtraction. The experiments that was carried out displays that the Mean shift tracking technique is effective, efficient and capable creating tennis player tracking system in different scenes such as tracking sports players. The Mean-shift tracking system relies on the area of a video frame which appears to be most similar to a previously initialized model.
The detection and tracking of the techniques are being carried out. This project has inspected techniques to develop the performance of object detection and tracking algorithms for multi object tracking system. Motion segmentation is a primary movement to the next step in most of the tracking techniques. Developing segmentation results as well as being able to extract additional information such as frame difference and background subtraction will help developing the detection and tracking of the system. However coordinating a kalman filter inside the standard tracking system permits the kalman filter to utilize continuously overhauled components and helps in principle preparing character of the tracked object, and gives tracking system viably.
Recommendations for further work
There are some issues for the player tracking system. Tracking players must be able to imply so that the tracking method can recognize the additional players in order to keep them consistently labelled. This can be a difficult task when players are similar in appearance. Similarly, tracking objects of different classes requires the tracking method to automatically distinguish between multiple classes of object.
The problem of player tracking and activity recognition is formulated as three phases: data acquisition, player tracking and activity recognition. The results of the tracking phase are passed to a final phase responsible for interpreting high level concepts such as human activity and behaviour.
Measurement approach to Quality of Experience
The two major methodologies of quality assessment are subjective and objective with the measurement and ensuring of positive Quality of Experience for instance for measurement of quality is known as the Mean Opinion Score. This form of measurement is standard according recommendations of the ITU-T.
This is held in the definition of the numeric value increasing from 1 to 5 which is to say poor to excellent with the main drawback being the approach which is to say increased in cost, time consumption and cannot be applied in real time as well as inadequate repeatability. It is such limitations that are the basis for motivation of the formulation of objective resources that provide prediction of the subjective quality that is solely originating from physical attributes.
On the basis of the definition, the objective approach is found on the premise of mathematical or in comparative mechanisms. These provide the generation of quantitative measurement of one way motion tracking quality. This approach is also of use in situations of the monitoring of service quality or network/terminal design in addition to codec selection and optimization. Furthermore, the objective approach is known for being intrusive or non-intrusive with intrusive methodologies operating on the premise of signals whereas the non-intrusive methodologies operate on network or application parameters. In overall, intrusive methodologies are accurate although are not practicable to monitor live traffic since they require the default sequence. This is to say the complete reference quality measurement with non-intrusive models not holding the requirement of a default copy. Objective approach normally ignores the type of content as well as the perceived approach of the content according to the Human Visual System. For instance, certain objective methodologies attempt at the comparison of the default and received pixel by pixel signal detection and distortion for instance Peak Signal to Noise Ratio.
References
People-tracking by detection by M. Andriluka, S. Roth, and B. Schiele.
CVPR, 2008 [Accessed on 16 March 2016]
Multiple object tracking by J. Berclaz, F. Fleuret, and P. Fua.
using flow linear programming. 2009[Accessed on 16 March 2016]
D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking
of non-rigid objects using mean shift. CVPR, 2000[Accessed on 20 March 2016]
N. Dalal and B. Triggs. Histograms of oriented gradients for
human detection. CVPR, 2005 [Accessed on 18 March 2016]
A. Ellis and J. Ferryman. PETS2010 and PETS2009 evaluation
of results using individual ground truthed single views.
AVSS, 2010[Accessed on 21 March 2016]
H. Jiang, S. Fels, and J. J. Little. A linear programming
approach for multiple object tracking. CVPR, 2007 [Accessed on 16 February 2016]
Z. Khan, T. Balch, and F. Dellaert. MCMC data association
and sparse factorization updating for real time multitarget
tracking with merged and multiple measurements [Accessed on 26 February 2016]
B. Leibe, K. Schindler, N. Cornelis, and L. Van Gool. Coupled
detection and tracking from static cameras and moving
vehicles. PAMI, 30(10), 2008 [Accessed on 1 April 2016]
S. Oh, S. Russell, and S. Sastry. Markov chain Monte Carlo
data association for general multiple-target tracking problems 2004 [Accessed on 5 April 2016]
K. Okuma, A. Taleghani, N. de Freitas, J. Little, and
D. Lowe. A boosted particle filter: Multitarget detection and
Tracking 2004 [Accessed on 30 March 2016]
D. B. Reid. An algorithm for tracking multiple targets.
T Automat Contr, 1979 [Accessed on 31 March 2016]
S. Walk, N. Majer, K. Schindler, and B. Schiele. New features
and insights for pedestrian detection, 2010 [Accessed on 28 March 2016]
B. Wu and R. Nevatia. Detection and tracking of multiple,
Partially occluded humans by Bayesian combination of
edge let part detectors, 2007 [Accessed on 31 March 2016]
Moving Object Detection Using Background Subtraction by Soharab Hossain Shaikh – Khalid Saeed – Nabendu Chaki /2014 [Accessed on 5 April 2016]
Gilbert, C. Li, W. 2013. Top down influence of visual processing. Nature Reviews Neuroscience.14, 5, 63-350. [Accessed on 5 April 2016]
Nassi, J. Callaway. E. 2009. Parallel processing strategies in visual systems. Nature reviews. Neuroscience .10, 5, 360-372. [Accessed on 01 April 2016]
Mackenzie, A. 2009. Intensive wireless signal processing in movement in calculation to envelopment. Environment and Planning.41, 6, 1294-1308. [Accessed on 02 April 2016]
Plass, S. Armin, D. 2006. Cellular cylic delay diversification of next generation of mobile systems. IEEE Vehicular Technology Conference.2, 1, 562-566. [Accessed on 03 April 2016]
Guo, D. Chih, W. 2008. Multiuser detection in sparse CDMA spread. IEEE Journal on Selected Areas in Communications.26, 3, 421-431. [Accessed on 04 April 2016]
Hsiao, C. Shin, C. Guizani, M. 2008.Next generation CDMA technologies in the actual approach of perfected orthogonal code generation. IEEE Transactions on Vehicular Technology.57 (5), 2811-2833.
[Accessed on 31 March 2016]
Fuh, Y. Mao, H. 2006. Orthogonal code in next generation CDMA system MPSK. Symposium on Wireless Pervasive Computing.7, 2, 16-135. [Accessed on 01April 2016]
Adachi, F. Gard, D. Takaoka, S. Takeda, K. 2005.Techniques of broadband CDMA. IEEE Wireless Communications.12, 2, 8-18. [Accessed on 31 March 2016]
Ranga, R. 2012. Multi user CDMA detector using Elliptic curve cryptography. International Journal of VLSI Design & Communication Systems.3, 1, 51-67. [Accessed on 1 April 2016]
Arash, M. Young, Y. (2002).Linear MMSE receiver in asynchronous random CDMA chip pulse shapes. IEEE Transactions on Vehicular Technology.51,5, 1072-1086. [Accessed on 10 April 2016]
Wang, X. Sheng, W. 2007. Collective signal processing in distributed wireless networks. Journal of Parallel and Distributed Computing.67,5,501-151. [Accessed on 10 April 2016]
Forse, U. 2009.Toolbox for signal processing.1,1, 12-20. [Accessed on 10 April 2016]
Soharab Hossain Shaikh, Kahaled Saeed, Nabendu Chaki. 2014/ Moving Object Detection Using Background Subtraction, [Accessed on 10 April 2016]
Cyganek, Boguslaw, 2013, Object Detection and Recognition in Digital Images: Theory and Practice, [Accessed on 5 April 2016]
http://opencv-srf.blogspot.co.uk/ Accessed on 21/04/2016
http://opencv-java-tutorials.readthedocs.io/en/latest/ Accessed on 20/04/2016
Appendices
Appendix 1: Terms of Reference
Appendix 1: Terms of Reference
NORTHUMBRIA UNIVERSITY
Individual Project
CM0645
oguz.akdemir
11/13/215
Fahrettin Oguz Akdemir
General Computing Project
Player Tracking for Tennis Video Analysis
To demonstrate a video that shows automatic detection and tracking of the tennis players.
Project Title
Player Tracking for Tennis Video Analysis.
Background to Project
Intelligent sports video analysis systems has many systems that are famous, previously people focused on high-level analysis such as video summarization and shot event recognition. Recently with the development of accurate object detection and tracking techniques, the trends have moved to a more detailed analysis of sports videos, such as providing game statistics such as average speed and running distance of players in order to analyse their competitor's strength and weakness. TV broadcasting companies also benefit by using this systems to create star-camera views and video stream that highlight star players. The statistics are very important for analysing the performance of sports players. For instance in Tennis people record how many points a team and a player makes. These statistics are recorded by human annotators during the game. In professional sports such as NBA, a team may hire 10 to 20 staff to perform this task, depending on how many pieces they want to collect. With the tracking systems, the expensive process can be automated using cameras and computers. According to the developers automating these processes by an intelligent system would significantly increase the production speed and reduce cost.
However, establishing such an automatic player tracking system is not an easy job. In sports videos, players wear similar uniforms of the same colours, and they all have similar skin colour and body shapes. This usually confuses the tracking system, in broadcast sports videos, occlusions are frequent and sometime long. Without sufficient image evidence, tracking systems are likely to lose track of players during occlusions. Many tracking systems assume that targets have a certain motion pattern. These are also hard to recognize in tracking system videos because players usually have a complicated motion pattern. Most existing systems could only recognize players in close-up view, where facial features are clear.
Object tracking over a sequence of images is the problem of finding the shape of an object that was given in the first frame in the subsequent frames of the sequence. This section shows how segmentation with moment constraints can be generalized with a few modifications to a method for object tracking. [1]
For instance one of the current techniques are multiple object tracking. This example shows how to perform automatic detection and tracking of objects that are in action in a video. Detection of moving objects and multiple tracking are important components of many computer vision applications, including activity recognition, traffic monitoring, and automotive safety. The problem of multiple object tracking can be divided into two parts and those are detection of moving objects in each frame and associating the detections corresponding to the same object over time. The detection of moving objects uses a background subtraction algorithm based on Gaussian mixture models. Morphological operations are applied to the resulting foreground mask to eliminate noise. Finally, blob analysis detects groups of connected pixels, which are likely to correspond to moving objects.
The association of detections to the same object is based uniquely on motion. The motion of each track is estimated by a Kalman filter. The filter is used to predict track's location in each frame, and define the detection of the objects that is assigned to each track.
The system combines computer vision and machine learning algorithms to automate the tracking, it basically detects players by learning their shapes and tracks them based on their movements. The play field is registered by the homography in each image in order to locate the foot position of players on the field. Finally, the appearance of each player is learned from automatically generated tracking results and propagates the information across all tracks of players based on conditional random field. We localize players using a player detector and categorize detections based on tennis players. This technology can be used to gather game statistics (e.g. the average speed and running distance of Tennis players) in order to analyse their competitors strength and weakness.
In this project we will develop an intelligent system to track and detect players from video sequences. The system will automatically localize and track players over time, estimate their locations on the court, and finally detect the players.
The proposed tracking and detection system has many potential applications. In this project, we only implement three of them: automatic collection of player statistics, detection and tracking of the players.
Proposed work
This project introduces intelligent systems that detects tracks and collect game statistics of the players. We focus on video frames taken from a court view because it provides more information about players' locations on the court and this has more applications on collecting game statistics. The contributions of this project as follows:
In order to track people in a fully automated manner, it is necessary to first detect or re-acquire their presence in individual video frames. This topic is closely related to pedestrian detection, which is often considered as a kind of object recognition. The recognition problem can be broken down along several axes. Object detection involves quickly scanning an image to determine where a match may occur. If we have a specific rigid object we are trying to recognize instant recognition, we can search for characteristic feature points and verify that they align in a geometrically plausible way. [2]
Computer Vision System Toolbox provides video tracking algorithms, such as mean shift (CAMShift), Kanade-Lucas-Tomas (KLT) and multiple object tracking. I will be considering these techniques for tracking multiple objects. The project will also provide techniques for Kalman filtering and the Hungarian algorithm for assigning object detections to tracks.
Object detection and tracking has many applications in sports video analysis. For instance, when collecting game statistics automatically, we can use the trajectories of players to estimate their average speeds and running distances, and analyse their moving styles. In addition, tracking also assists other components in the proposed video analysis system. For instance, tracking provides image patches that contain appearance information of players. Tracking will provide temporal aggregation of information by grouping image patches over time into an entity.
The example figure shows how computer vision applications, Kalman filters are used for object tracking to predict an object's future location, to account for noise in an object's detected location, the figure below show and to help associate multiple objects with their corresponding tracks. [4]
The output of the Kalman filter is indicated by the red circles and the object detection is indicated in black. When the ball is occluded and there are no detections the filter is used to predict its location [4]
Automatic collection of player statistics is very important for analysing the performance of sports players. In tennis, people record how many points a player makes. These statistics are recorded by human annotators during the game. With the tracking system as produced in this project, this expensive process can be automated by using cameras and computers. Given a video sequence, we can first track and detect players. Then, we can use the homography transformation estimation to compute the players' trajectories on tennis court. This enables us to collect statistics such as the players' running distances, average speeds.
Tracking is a broad field in computer vision, and there are numerous algorithms and systems developed in the past to challenge this problem. For tracking multiple targets in a video sequence, the most popular approach is to apply multiple independent single-target trackers to track different targets. Specifically, one can initiate a single-target tracker by the initial location of the target, and then ask it to follow the player throughout the video sequence. The initial location can be given by hand or by an automatic object detection system.
Figure 1 shows automatic detection of playing scenes and the automatic tracking of the tennis players.
With the appearance of accurate object detectors modern multi target tracking systems generally carry out a tracking-by-detection approach. These systems first run an object detector to locate people in images, and then utilize some mechanisms to connect these detections into the court.
Aims
Create a sample movie that shows automatic tracking of the tennis players. Investigate the current algorithms of detection and tracking objects and collections of game statists.
Objectives
• Investigate the project requirements – I will investigate the main requirements of projects system requirements
• Learn computer vision based algorithm techniques- Investigate and learn computer vision based algorithms for tracking and collection statistics for the project
• Investigate current algorithms tracking players- Research current algorithms that are used in the past for tracking an object of the sports videos or video stream.
• Learn OpenCV software – I will read and practice OpenCV software for the project.
• Implement a tack system based on the appropriate algorithms and methods.
• Learn MATLAB as an alternative.
• Learn OpenTL Library to develop the system as an alternative
• Collect game statistics by using appropriate algorithms and methods
• Test the product – I will make sure everything is working correctly according to the requirements. Tracking objects backwards and forwards will be tested. Test detection of moving objects and segment them. Sequence of images, pixel based colour will be tested for tracking the object.
• Evaluate the product – Evaluate the product with Ground truth data and robustness of tracking algorithm
Structure/contents of report
• Title Page
• List of Contents
• Abstract
• Introduction
o Background of the sports analysis project
o Overview of aims and objectives
• Analysis
o Establish project needs
o Literature Review
o Tools and Techniques
• Synthesis
o Learn Computer vision and application based algorithms.
o Implementation of Computer vision based tracking algorithm.
• Evaluation
o Evaluation of the product
o Evaluate the project with ground truth data and strength of tracking algorithms
o Conclusions and recommendations
• References
• Bibliography
• Appendices
Skills
Java Programming 1 & 2 I have learnt Java programing during first year from the modules Programming 1 and 2
Programming Design & Development Modules from Programming Design & Develeopment
System Analysis Modules from System Analysis
Professional Software Engineering Practice Modules from Professional Software Engineering Practice
Small Embedded systems Modules from Small Embedded Systems which introduce C language required for this project
Learn OpenCV From http://opencv-java-tutorials.readthedocs.org/en/latest/
Learn MATLAB Learn MATLAB as in alternative option for the project.
Learn OpenTL http://www.opentl.org/library.html
Resources
• Video– To be able to capture game and gather information about the game statistics.
• Computer – To download the tennis game library for the implementation of the system.
• Microsoft Visual Studio- software to implement the system.
• OpenCV – Software library to use to create the system.
• OpenTL – Libarary to use as an alternative to develop the system.
• MATLAB – software to use as an alternative for implementation of the system.
Marking Scheme
My project will be a General computing Project.
Product 30%
Fitness for purpose 50%
Build Quality 50%
Report 60%
Abstract & Introductions 5%
Analysis 30%
Synthesis 30%
Evaluation 30%
Presentation 5%
Viva 10%
Presentation 100%
Project Plan
Bibliography
CYGANEK.B (2013) Object Detection and Recognition in Digital Images: Theory and Practice
Accessed on 05/11/2015
[1]. Giovanni Maria Farinella (2013) Advanced Topics in Computer Vision (p231)
Accessed on 15/11/2015
[2]. Richard Szeliski (2011) Computer Vision Algorithms and Applications (p577)
Accessed on 15/11/2015
[4].http://uk.mathworks.com/help/vision/examples/using-kalman-filter-for-object-tracking.html
Accessed on 16/11/2015
[6].http://uk.mathworks.com/help/vision/examples/motion-based-multiple-object-tracking.html
Accessed on 16/11/2015
http://plaza.ufl.edu/lvtaoran/object%20tracking.htm
Accessed on 17/11/2015
http://www.opentl.org/library.html
Accessed on 17/11/2015
http://www.intorobotics.com/how-to-detect-and-track-object-with-opencv/
Accessed on 17/11/2015
http://opencv-java-tutorials.readthedocs.org/en/latest/09%20-%20Object%20Detection.html
Accessed on 17/11/2015