I am able to track 4 coordinates over different images of the same scene by calculating a 3x3 homography between them. Doing this I can overlay other 2D images onto these coordinates. I am wondering if I could use this homography to augment a cube onto this position instead using opengl? I think the 3x3 matrix doesn't give enough information but if I know the camera calibration matrix can I get enough to create a model view matrix to do this?
Thank you for any help you can give.
If you have the camera calibration matrix (intrinsic parameters) and the homography, since the homography (between two view of the same planar object) is defined as:
H = K[R|T]
where K is the 3x3 calibration matrix, R (3x3 rotation matrix) and T (3x1 translation vector) is the view transform (from object coordinates to camera coordinates). There is a lot to say about how to compute R and T from H. One way is to compute a direct solution, the other way is to use some non-linear minimization technique to compute R and T. Obviously, the latter method is better, since it will give the better approximate solution. The former is just a way to start doing augmented reality ;):
Let'see how to derive R and T for when using a direct method. If h1,h2 and h3 are the column vectors of H, define in terms of K,R and T as:
H = K [r1 r2 t]
(remember that we are speaking of points with z=0)
where r1 is the first column vector of R, r2 the second and t is the translation vector. Then:
r1 = l1 * (K^-1) h1
r2 = l2 * (K^-1) h2
r3 = r1 x r2
(cross product between r1 and r2)
t = l3 * (K^-1) h3
where l1,l2,l3 are scaling factors (real values):
l1 = 1 / norm((K^-1)*h1)
l2 = 1 / norm((K^-1)*h2)
l3 = (l1+l2)/2
Keep in mind that this solution should be refined using a non linear minimization method (for example, you can use this solution as a starting point). You can also use some distorsion model to recover from lens distorsions, but this step is unnecessary (you will get good results even without it).
If you want to use a minimization method to compute a better approximation to R and T, there are a lot of different ways. I suggest to you to read the paper
"Fast and globally convergent pose estimation from video images", Lu, Hager
which presents one of the best algorithms out there for your purpose.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With