As part of my master thesis I am exploring Structure From Motion. After reading parts of the H&Z book, following online tutorials and reading through many SO posts I have some useful results, but I have also some problems. I'm using OpenCVSharp wrapper. All images are taken with the same camera.
What I have now:
First I calculate initial 3d points coordinates. I do this with these steps:
Get Essential matrix using camera intrinsics (at this point I use pre-determined intrinsics) and decompose it:
Mat essential = camera_matrix.T() * fundamentalMatrix * camera_matrix;
SVD decomp = new SVD(essential, OpenCvSharp.SVDFlag.ModifyA);
Mat diag = new Mat(3, 3, MatType.CV_64FC1, new double[] {
1.0D, 0.0D, 0.0D,
0.0D, 1.0D, 0.0D,
0.0D, 0.0D, 0.0D
});
Mat Er = decomp.U * diag * decomp.Vt;
SVD svd = new SVD(Er, OpenCvSharp.SVDFlag.ModifyA);
Mat W = new Mat(3, 3, MatType.CV_64FC1, new double[] {
0.0D, -1.0D, 0.0D,
1.0D, 0.0D, 0.0D,
0.0D, 0.0D, 1.0D
});
Mat Winv = new Mat(3, 3, MatType.CV_64FC1, new double[] {
0.0D, 1.0D, 0.0D,
-1.0D, 0.0D, 0.0D,
0.0D, 0.0D, 1.0D
});
Mat R1 = svd.U * W * svd.Vt;
Mat T1 = svd.U.Col[2];
Mat R2 = svd.U * Winv * svd.Vt;
Mat T2 = -svd.U.Col[2];
Mat[] Ps = new Mat[4];
for (int i = 0; i < 4; i++)
Ps[i] = new Mat(3, 4, MatType.CV_64FC1);
Cv2.HConcat(R1, T1, Ps[0]);
Cv2.HConcat(R1, T2, Ps[1]);
Cv2.HConcat(R2, T1, Ps[2]);
Cv2.HConcat(R2, T2, Ps[3]);
Then I check which projection matrix has the most points in front of both cameras by triangulating the points and then multiplying them by projection matrices (I tried both Cv2.TriangulatePoints and H&Z version with similar results) and checking for positive Z values (after converting from homogenous values):
P * point3D
Then I calculate SolvePNP for every new frame by using again the dense optical flow and with the previous projection matrix known I calculate next 3D points and add them to the model. Again 3D visualization looks more or less correct (no bundle adjustment at this point).
Since I need to use SolvePNP for every new frame I started by checking it with the one calculated for the first 2 images with the fundamental matrix. Theoretically the projection matrix should be the same or almost the same as the one calculated with the initial algorithm - I use the initial 3D points and the corresponding 2D points in the second image. But it's not the same.
Here is the one calculated by decomposing the fundamental matrix:
0,955678480016302 -0,0278536127242155 0,293091827064387 -0,148461857222772
-0,0710609269521247 0,944258717203142 0,321443338158658 -0,166586733489084
0,285707870900394 0,328023857736121 -0,900428432059693 0,974786098164824
And here is the one I got from the SolvePnPRansac:
0,998124823499476 -0,0269266503551759 -0,0549708305812315 -0,0483615883381834
0,0522887223187244 0,8419572918112 0,537004476968512 -2,0699592377647
0,0318233598542908 -0,538871853288516 0,841786433426546 28,7686946357429
Both of them look like correct projection matrices, but they are different.
For those patient people who read the whole post I have 3 questions:
1. Why are these matrices different? I know the reconstruction is up to scale, but since I have an arbitrary scale assigned in the first place the SolvePNP should keep that scale.
2. I noticed one strange thing - the translation in the first matrix seems to be exactly the same no matter what images I use.
3. Is the overal algorithm correct, or am I doing something wrong? Do I miss some important step?
If more code is required let me know and I will edit the question.
Thank You!
To begin with, there is one obvious reason why the two approaches you described are unlikely to provide the exact same projection matrices : they both estimate their results using RANSAC, which is an algorithm based on randomness. As both approaches randomly select some of the correspondences in order to estimate a model fitting a majority of them, the result depend on the chosen correspondences.
Hence, you cannot expect to obtain exactly the same projection matrices with both approaches. However, if everything were OK, they should be quite close, which does not seem to be the case. The two matrices that you showed have very different translation, indicating that there probably is a more serious problem.
First, the fact that "the translation in the first matrix seems to be exactly the same no matter what images I use" seems to me a strong clue that there might be a bug in your implementation. I would suggest to first investigate this in detail.
Second, I do not think that using Optical Flow in a Structure From Motion workflow is appropriate. Indeed, Optical Flow requires the two considered images to be very close (e.g. two successive frames of a video), whereas 3D triangulation of corresponding points in two images requires a large baseline in order to be accurate. These two requirements are contradictory, which might lead to problems and inaccuracies in the results, hence explaining the different results of the two approaches.
For instance, if the two images you consider are two successive video frames, you will not be able to triangulate points accurately, which might lead to the selection of the wrong projection matrix in step 4 and might also cause SolvePnP
to estimate a wrong projection matrix. On the other hand, if the two images you consider have a large baseline, triangulation will be accurate but Optical Flow will probably have a lot of mismatches, which will introduce errors in the whole workflow.
One thing you could do, in order to understand where your problems are coming from, would be to use synthetical data with known projection matrices and 3D points. Then, you could analyze the accuracy of each step and check if they generate the expected results.
I'm writing to let everyone know that I did not succed in solving this problem - I used the fundamental matrix initial triangulation and than SolvePnP despite knowing that the results are incorrect. It is not a perfect solution, but sometimes it works. It was enough for my whole project to be accepted and for me to graduate :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With