I'm trying to calculate the cameras position for an image. I have 2 images of a rubiks cube. The first image is considered to be the base image and the next image is the image after the camera has moved. So for the first image I assume that the camera is at (0,0,0). On this image I then identify the 4 corners of the front face of the rubiks cube as shown here (4 corners identified by the 4 blue circles).
Then for the next image (after camera movement), I identify the same face of the rubiks cube as show here
So by assuming the first image as the base image, does anyone know if/how i can calculate how much the camera has moved for image 2 as shown here:
I would suggest you use OpenCV for this. I also think, this question would be more suited to StackOverflow.
The textbook on this subject would be "Multiple-View Geometry" by Hartley and Zisserman. http://www.robots.ox.ac.uk/~vgg/hzbook/ (There is a sample chapter on the Fundamental Matrix on that website.)
Basically, first find the Fundamental Matrix, then by knowing the intrinsic parameters of the camera, find a solution to the position.
Fundamental Matrix: http://en.wikipedia.org/wiki/Fundamental_matrix_%28computer_vision%29
Intrinsic Parameters: Stuff like the focal length and where the principal point is on the image plane. If you have F, then E = K^t * F * K, if K is the intrinsic matrix and the same for both images.
How to find a solution to the camera position: http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E
This is how I would do it in OpenCV. I have done this before, so it ought to work.
1. Run Feature Detection and Detector Extractor on both images.
2. Match Features.
3. Use F = cv::findFundamentalMatrix with Ransac.
4. E = K.t() * F * K. // K needs to be found beforehand.
5. Do SingularValueDecomposition of E such that E = U * S * V.t()
6. R = U * W.inv() * V.t() // W = [[0, -1, 0], [1, 0, 0], [0, 0, 1]]
7. Tx = V * Z * V.t() // Z = [[0, -1, 0], [1, 0, 0], [0, 0, 0]]
8. get t from Tx (matrix version of cross product)
9. Find the correct solution. R.t() and -t are possiblities.
10. Get overall scale by knowing the length of the size of the Rubrik's cube.
I am certain that a more straightforward approach can also work. The benefit of this approach is that no human input is needed (unsupervised). This is not true for the optional step 10 (determining scale).
A different solution would exploit the knowledge of the geometry of the Rubrik's cube. For example, six (5.5) points are needed to estimate the position of the camera, if the point's 3D position is known.
Unfortunatly, I am not aware of any software that does this for you automatically.
So here is the alternative algorithm: Write down the coordinates of the corners of the cube as (X_i, Y_i, Z_i), and possibly also points with other knowable positions.
Mark the corresponding points u_i = (x_i, y_i). For every correspondence create two lines in a matrix A. (X_i, Y_i, Z_i, 1, 0, 0, 0, 0, -x_iX_i, -x_iY_i, -x_iZ_i -x_i) (0, 0, 0, 0, X_i, Y_i, Z_i, 1, -y_iX_i, -y_iY_i, -y_iZ_i -y_i)
Then find p such that Ap = 0. I.e. p is the right kernel of A, or the least-squared solution to Ap=0.
De-flatten p, to create a 3x4 matrix. P.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With