I've been trying to implement a simple SFM pipeline in OpenCV for a project and I'm having a bit of trouble.
It's for uncalibrated cameras so I don't have a camera matrix (Yes, I know it's going to make things much more complicated and ambiguous).
I know that I should be reading a lot more before attempting something like this but I'm quite hard pressed for time and I'm trying to read about things as I come across them.
Here's my current pipeline I've gathered from a number of articles, code samples and books. I've posted questions about specific steps after it and would also like to know is there something I'm missing in this or something I'm doing wrong?
Here's my current pipeline.
Q) Do I need to even do this? Is it too much Or should I be doing something else like Homography to avoid the Degenerate case of the 8-Point?
Next, I need to choose 2 images to begin the reconstruction with.
Q) Is this right? Or should I just triangulate the points and then determine if they are in front of the camera or not or does it work out to the same thing?
Set P=[I|0] and P1=[R|T], perform Triangulation and store the 3d points in some Data Structure. Also store the P matrices.
Run a Bundle Adjustment Step with a large-ish number of iterations to minimize error.
It gets a little hazy from here and I'm pretty sure I'm messing something up.
Choose the next Image to add based off the most number of 3d points it has observed.
Triangulate this new image with all (I know, I don't need to do it with ALL of them) the images triangulated so far using their P matrices as P=PMatrices[ImageAlreadyTriangulated] and P1 obtained above.
Q) Is it really as simple as just using the original value of P we have used? Will that get everything into the same coordinate space? As in, will the triangulated points all be the same system as those obtained from the initial values of P and P1 or do I need to do some kind of transformation here?
From the points we obtain from triangulation, only add those 3D points that we don't already have stored.
General questions:
I know this must have made for a long read. Thank you very much for your time :)
The pipeline you propose is generally correct. Except 3.1.
2.2) Correct. RANSAC picks points at random to estimate the fundamental matrix and is robust enough to outliers (as long as you have enough valid matches of course). Homography outliers are NOT necessarily bad matches and so homography should not be used to filter matches.
3.1) Incorrect: Homography inliers are matches that are perfectly aligned in both views, for example points that exhibit proportional or similar movement between the 2 views. What this means is, the higher the number of homography inliers in a view pair, the LESS the ViewPair is a good candidate as a seed for Baseline Triangulation. The camera matrices of such 2 views from a Fundamental matrix estimated with RANSAC will most likely come out inaccurate and the reconstuction will never pick up. What you want to do instead, is start with the ViewPair that has the LOWEST percentage of homography inliers, and still a high number of matches. Unfortunately the Image Pairs that have the highest number of matches also usually have the highest number of homography inliers. This is due to the fact that usually those pairs contain very little camera movement...
3.4) What I do is try the triangulation using all 4 possible Camera matrix ambiguations. R1|t1, R1|t2, R2|t1, R2|t2
8) Yes
I can recommend this article; https://github.com/godenlove007/master-opencv-book/tree/master/Chapter4_StructureFromMotion
In order to build it, you will need SSBA and PCL libraries as prerequisites. SSBA is quite simple to build but PCL can be tricky if you are planning to use Visual Studio 2013. In that case, you have to build PCL's prerequisites from source and that will take some time.
Once you build this project, you can check how that guy did it and compare with your ideas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With