Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCV: Strange rotation and translation matrices from decomposeHomographyMat

I am trying to solve a plane2plane projection problem using the findHomography function found in the OpenCV library. As a toy example I have a set of points in R2, P, and a second set of points in R2, Q, where Qx = Px+50, Qy = Py. Meaning I've offset the x-coordinates by 50. Now I run the following code:

Mat projectionMatrix = findHomography(Q, P);
vector<Point2f> projectedPoints(objectCoordinates.size());
perspectiveTransform(Q, projectedPoints, projectionMatrix);

And that gives me P, which is great. However, now I would like the rotation and translation matrices, R & T, and this is where I get confused. OpenCV 3 has a function called decomposeHomographyMat that returns up to 4 solutions for R and T (also returns the normals but I do not store those). I run it as such:

vector<Mat> Rs;
vector<Mat> Ts;
decomposeHomographyMat(projectionMatrix, cameraMatrix, Rs, Ts, noArray());

The cameraMatrix I use is tried and tested from previous experiments. Ok, so I get my four results. Looking at them I notice that I get the identity matrix as a result for all R which is great. However, all of the translation vectors are [0,0,0]T, whereas I would expect at least one of them to be [-50,0,0]T. Is there something I need to do with the result from decomposeHomographyMat to get the expected behaviour?

Thanks

like image 610
IlliteratePhD Avatar asked Mar 11 '16 13:03

IlliteratePhD


2 Answers

It turns out I was wrong in some points, so I have decided to rewrite this answer.

In brief - you are getting strange results because of incorrect intrinsic parameter matrix.

Using the terminology from the paper "Malis, E and Vargas, M, "Deeper understanding of the homography decomposition for vision-based control" (on which the homography decomposition in OpenCV is based), the perspective transformation is denoted by H, and called a Euclidean homography matrix, and result of its normalization G = K^-1 * H * K (where K is camera's calibration matrix) is called a homography matrix

Both cv::findHomography() and cv::decomposeHomographyMat() work with Euclidean homography matrix. But in order to decompose it into translation and rotation, cv::decomposeHomographyMat() normalizes Euclidean homography matrix to obtain homography matrix. It relies on K, provided from user, in order to perform this normalization.

As for the estimation of K, I think this is beyond the scope of this question. This problem is called Camera auto-calibration, here's a relevant quote from this wiki article:

Therefore, three views are the minimum needed for full calibration with fixed intrinsic parameters between views. Quality modern imaging sensors and optics may also provide further prior constraints on the calibration such as zero skew (orthogonal pixel grid) and unity aspect ratio (square pixels). Integrating these priors will reduce the minimal number of images needed to two.

It seems, that you can extract the K from image correspondances over 2 frames from the same camera under the zero skew and square pixel assumptions. But, I am not familiar with this subject, so can't give you more suggestions.

So, to check if my interpretation is correct, I have made a small example, that projects some points on a plane in 3D on 2 virtual cameras, finds a homography, decomposes it, and allows comparing this decomposition to ground truth rotation and translation vectors. This is better than real inputs, because this way we know the K precisely, and can decouple error in its estimation from error in R and t. For inputs I have checked it was able to correctly estimate rotation and translation vectors, although for some reason translation is always smaller than ground truth 10 times. Maybe this decomposition is defined only up to a scale (I'm not sure now), but it is interesting, that it is related to the ground truth value with a fixed coefficient.

Here's the source:

#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>


int main() {
  // set up a virtual camera
  float f = 100, w = 640, h = 480;

  cv::Mat1f K = (cv::Mat1f(3, 3) <<
      f, 0, w/2,
      0, f, h/2,
      0, 0,   1);

  // set transformation from 1st to 2nd camera (assume K is unchanged)
  cv::Mat1f rvecDeg = (cv::Mat1f(3, 1) << 45, 12, 66);
  cv::Mat1f t = (cv::Mat1f(3, 1) << 100, 200, 300);

  std::cout << "-------------------------------------------\n";
  std::cout << "Ground truth:\n";

  std::cout << "K = \n" << K << std::endl << std::endl;
  std::cout << "rvec = \n" << rvecDeg << std::endl << std::endl;
  std::cout << "t = \n" << t << std::endl << std::endl;

  // set up points on a plane
  std::vector<cv::Point3f> p3d{{0, 0, 10},
                               {100, 0, 10},
                               {0, 100, 10},
                               {100, 100, 10}};

  // project on both cameras
  std::vector<cv::Point2f> Q, P, S;

  cv::projectPoints(p3d,
                    cv::Mat1d::zeros(3, 1),
                    cv::Mat1d::zeros(3, 1),
                    K,
                    cv::Mat(),
                    Q);

  cv::projectPoints(p3d,
                    rvecDeg*CV_PI/180,
                    t,
                    K,
                    cv::Mat(),
                    P);

  // find homography
  cv::Mat H = cv::findHomography(Q, P);

  std::cout << "-------------------------------------------\n";
  std::cout << "Estimated H = \n" << H << std::endl << std::endl;


  // check by reprojection
  std::vector<cv::Point2f> P_(P.size());
  cv::perspectiveTransform(Q, P_, H);

  float sumError = 0;

  for (size_t i = 0; i < P.size(); i++) {
    sumError += cv::norm(P[i] - P_[i]);
  }

  std::cout << "-------------------------------------------\n";
  std::cout << "Average reprojection error = "
      << sumError/P.size() << std::endl << std::endl;


  // decompose using identity as internal parameters matrix
  std::vector<cv::Mat> Rs, Ts;
  cv::decomposeHomographyMat(H,
                             K,
                             Rs, Ts,
                             cv::noArray());

  std::cout << "-------------------------------------------\n";
  std::cout << "Estimated decomposition:\n\n";
  std::cout << "rvec = " << std::endl;
  for (auto R_ : Rs) {
    cv::Mat1d rvec;
    cv::Rodrigues(R_, rvec);
    std::cout << rvec*180/CV_PI << std::endl << std::endl;
  }

  std::cout << std::endl;

  std::cout << "t = " << std::endl;
  for (auto t_ : Ts) {
    std::cout << t_ << std::endl << std::endl;
  }

  return 0;
}

And here's the output on my machine:

-------------------------------------------
Ground truth:
K =
[100, 0, 320;
0, 100, 240;
0, 0, 1]

rvec =
[45;
12;
66]

t =
[100;
200;
300]

-------------------------------------------
Estimated H =
[0.04136041220427821, 0.04748763375951008, 358.5557917287962;
0.05074854454707714, 0.06137211243830468, 297.4585754092336;
8.294458769850147e-05, 0.0002294875562580223, 1]

-------------------------------------------
Average reprojection error = 0

-------------------------------------------
Estimated decomposition:

rvec =
[-73.21470385654712;
56.64668212487194;
82.09114210289061]

[-73.21470385654712;
56.64668212487194;
82.09114210289061]

[45.00005330430893;
12.00000697952995;
65.99998380038915]

[45.00005330430893;
12.00000697952995;
65.99998380038915]


t =
[10.76993852870029;
18.60689642878277;
30.62344129378435]

[-10.76993852870029;
-18.60689642878277;
-30.62344129378435]

[10.00001378255982;
20.00002581449634;
30.0000336510648]

[-10.00001378255982;
-20.00002581449634;
-30.0000336510648]

As you can see, there is correct estimation of rotation vector among the hypothesis, and there's a up-to-scale correct estimation of translation.

like image 54
alexisrozhkov Avatar answered Nov 06 '22 20:11

alexisrozhkov


You should use solvePnP instead on each plane and then get the relative translation and rotation from the two camera matrices (assuming you have at least 4 points). The problem with decomposeHomographyMat is that you can get up to 4 solutions... You can remove two that will land out of the image FOV, but this is still a problem.

  • @alexisrozhkov- I've found the error in your code- The article and the function both assume that the Z plane is at 1 (not z=10 like you had). Added below is a correct re implantation of your above code (fixed and in python):
import cv2
import numpy as np


# set up a virtual camera
f = 100
w = 640
h = 480

K = np.array([[f, 0, w/2],
              [0, f, h/2],
              [0, 0,   1]])
dist_coffs = np.zeros(5)
# set transformation from 1st to 2nd camera (assume K is unchanged)
rvecDeg = np.array([[45, 12, 66]]).T
t = np.array([[100.0, 200, 300]]).T

print("-------------------------------------------\n")
print("Ground truth:\n")

print("K = \n" + str(K))
print("rvec = \n" + str(rvecDeg))
print("t = \n" + str(t))

# set up points on a plane
p3d = np.array([[0, 0, 1],
                [100, 0, 1],
                [0, 100, 1],
                [100, 100, 1]], dtype=np.float64)

# project on both cameras

Q, _ = cv2.projectPoints(p3d,
                         np.zeros((3, 1)),
                         np.zeros((3, 1)),
                         K,
                         dist_coffs)

P, _ = cv2.projectPoints(p3d,
                         rvecDeg*np.pi/180,
                         t,
                         K,
                         dist_coffs)

# find homography
H, _ = cv2.findHomography(Q, P)

print("-------------------------------------------\n")
print("Estimated H = \n" + str(H))


# check by reprojection
P_ = cv2.perspectiveTransform(Q, H)

sumError = 0

for i in range(P.shape[0]):
    sumError += np.linalg.norm(P[i] - P_[i])


print("-------------------------------------------\n")
print("Average reprojection error = "+str(sumError/P.shape[0]))


# decompose using identity as internal parameters matrix
num_res, Rs, ts, n = cv2.decomposeHomographyMat(H, K)

print("-------------------------------------------\n")
print("Estimated decomposition:\n\n")
for i, Rt in enumerate(zip(Rs, ts)):
    R, t = Rt
    print("option " + str(i+1))
    print("rvec = ")
    rvec, _ = cv2.Rodrigues(R)
    print(rvec*180/np.pi)
    print("t = ")
    print(t)

and this is the result (option 3 came out as the correct result):

-------------------------------------------

Ground truth:

K = 
[[100.   0. 320.]
 [  0. 100. 240.]
 [  0.   0.   1.]]
rvec = 
[[45]
 [12]
 [66]]
t = 
[[100.]
 [200.]
 [300.]]
-------------------------------------------

Estimated H = 
[[3.93678513e-03 4.51998690e-03 3.53830416e+02]
 [4.83037187e-03 5.84154353e-03 3.05790229e+02]
 [7.89487379e-06 2.18431675e-05 1.00000000e+00]]
-------------------------------------------

Average reprojection error = 1.1324020050061362e-05
-------------------------------------------

Estimated decomposition:


option 1
rvec = 
[[-78.28555877]
 [ 58.01301837]
 [ 81.41634175]]
t = 
[[100.79816988]
 [198.59277542]
 [300.6675498 ]]
option 2
rvec = 
[[-78.28555877]
 [ 58.01301837]
 [ 81.41634175]]
t = 
[[-100.79816988]
 [-198.59277542]
 [-300.6675498 ]]
option 3
rvec = 
[[45.0000205 ]
 [12.00005488]
 [65.99999505]]
t = 
[[100.00010826]
 [200.00026791]
 [300.00034698]]
option 4
rvec = 
[[45.0000205 ]
 [12.00005488]
 [65.99999505]]
t = 
[[-100.00010826]
 [-200.00026791]
 [-300.00034698]]
like image 28
YoniChechik Avatar answered Nov 06 '22 18:11

YoniChechik