I've spent some months studying and making experiments with the process of keypoint detection, description and matching. In the last period, I'm also into the concepts behind augmented reality, precisely the "markerless" recognition and pose estimation.
Luckily, I've found that the previous concepts are still widely used in this setting. A common pipeline to create a basic augmented reality is the following, without going on details about each needed algorithm:
While capturing a video, at every frame...
- Get some keypoints and create their descriptors
- Find some matches between these points and the ones inside a previously saved "marker" (like a photo)
- If matches are enough, estimate the pose of the visible object and play with it
That is, a very simplified procedure used, for example, by this student(?) project.
Now the question: during my personal researches, I've also found another method called "optical flow". I'm still at the beginning of the study, but first I would like to know how much different is it from the previous method. Specifically:
Thanks for your cooperation.
Optical flow(OF) is method around so called "brightness constancy assumption". You assume that pixels - more specific, theirs intensities (up to some delta) - are not changing, only shifting. And you find solution of this equation: I(x,y,t) = I(x+dx, y+dy, t+dt).
First order of Tailor series is: I(x + dx, y+dy, t+ dt) = I (x,y,t) + I_x * dx + I_y * dy + I_t * dt.
Then you solve this equation and get dx and dy - shifts for every pixel.
Optical flow is mainly used for tracking and odometry.
upd.: if applied not to whole image, but the patch, optical flow is almost the same, as Lucas-Kanade-Tomashi tracker.
Difference between this method and feature-based methods is density. With feature points you usually get difference in position of the feature points only, while optical flow estimates it for whole image.
The drawback is that vanilla OF works only for small displacements. For handling larger ones, one can downscale image and calculate OF on it - "coarse-to-fine" method.
One can change "brightness constancy assumption" to, i.e., "descriptor constancy assumption" and solve the same equation but with descriptor value instead of raw intensity. SIFT flow is an example of this .
Unfortunately, I don`t know much about augmented reality commercial solutions and cannot answer the last question.
Optical flow computation is slow, while recent developments in detection speed has been a lot. Augmented reality commercial solutions require realtime performance. Hence, it is hard to apply optical flow based techniques (until you use a good GPU). AR systems mostly use feature based techniques. Most of the time their aim is to know the 3D geometry of the scene, which can be robustly estimated by a set of points. Other differences have been mentioned by old-ufo.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With