From what I've understood, tracking algorithms predict where a given object will be in the next frame (after object detection is already performed). The object is then again recognized in the next frame. What isn't clear is how the tracker then knows to associate the object in the 2nd frame as the same as the one in the 1st, especially when there are multiple objects in the frame.
I've seen in a few places that a cost matrix is created using Euclidean distance between the prediction and all detections, and the problem is framed as an assignment problem (Hungarian algorithm).
Is my understanding of tracking correct? Are there other ways of establishing that an object in one frame is the same as an object in the next frame?
In it, the Kalman filter is used to predict and update the location and velocity of an object given a video stream, and detections on each of the frames. At some point, we wanted to also account for the camera tilt and pan by adjusting the location of the object given the angles of movement.
Robust extended Kalman filter The extended Kalman filter arises by linearizing the signal model about the current state estimate and using the linear Kalman filter to predict the next estimate.
Deep SORT[2] is a recent algorithm for tracking that extends Simple Online and Real-time Tracking[3] and has shown remarkable results in the Multiple Object Tracking (MOT) problem. In the problem setting of MOT, each frame has more than one object to track.
The Kalman Filter (KF) is a set of mathematical equations that when operating together implement a predictor-corrector type of estimator that is optimal in the sense that it minimizes the estimated error covariance when some presumed conditions are met.
Your understanding is correct. You have described a simple cost function, which is likely to work well in many situations. However, there will be times when it fails.
Assuming you have the computational resources, you can try to make your tracker more robust, by making the cost function more complicated.
The simplest thing you can do is take into account the error covariance of the Kalman filter, rather than just using the Euclidean distance. See the distance equation in the documentation for the vision.KalmanFilter object in MATLAB. Also see the Motion-based Multiple Object Tracking example.
You can also include other information in the cost function. You could account for the fact that the size of the object should not change too much between frames, or that the object's appearance should stay the same. For example, you could compute color histograms of your detections, and define your cost function as a weighted sum of the "Kalman filter distance" and some distance between color histograms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With