Why Direct Linear Transformation (DLT) cannot give the optimal camera extrinsics?

Question

I'm reading the source code of function solvePnP() in OpenCV, when the flags param uses default value SOLVEPNP_ITERATIVE, it's calling cvFindExtrinsicCameraParams2, in which it FIRST uses the DLT algorithm (if we have a non-planar set of 3D points) to initialize the 6DOF camera pose, and SECOND uses CvLevMarq solver to minimize the reprojection error.

My question is: the DLT fomulates the problem as a linear least square problem and solves it with SVD decomposition, it seems to be an optimal solution, why do we still use Lev-Marq iterative method afterwards?

Or, what's the issue/limitation of the DLT algorithm to be inferior? Why is the closed-form solution result in a LOCAL minimum to the cost function?

BConic · Accepted Answer

When you want to find the solution to a problem, the first step is to express this problem in mathematical terms, and you can then use existing mathematical tools to find a solution to your equations. However, interesting problems can usually be expressed in many different mathematical ways, each of which may lead to a slightly different solution. It then takes work to analyze the different methods to understand which one provides the most stable/accurate/efficient/etc solution.

In the case of the PnP problem, we want to find the camera pose given associations between 3D points and their projections image plane.

A first way to express this problem mathematically is to cast it as a linear least squares problem. This approach is known as the DLT approach, and it is interesting because linear least-squares have a closed-form solution which can be found robustly using the Singular Value Decomposition. However, this approach assumes that the camera pose P has 12 degrees of freedom when really it has only 6 (3 for the 3D rotation plus 3 for the 3D translation). To obtain a 6DOF camera pose from the result of this approach an approximation is needed (which is not covered by the linear cost function of the DLT), leading to an inaccurate solution.

A second way to express the PnP problem mathematically is to use the geometric error as a cost function, and to find the camera pose that minimizes the geometric error. Since the geometric error is non-linear, this approach estimates the solution using iterative solvers, such as the Levenberg Marquardt algorithm. Such algorithms can take into account the 6 degrees of freedom of the camera pose, leading to accurate solutions. However, since they are iterative approaches, they need to be provided with an initial estimate of the solution, which in practice is often obtained using the DLT approach.

Now to answer the title of your question: sure, the DLT algorithm gives the optimal camera extrinsics, but it is optimal only in the sense of the linear cost function solved by the DLT algorithm. Over the years, scientists have found more complex cost functions leading to more accurate solutions, but also more difficult to solve.

Why Direct Linear Transformation (DLT) cannot give the optimal camera extrinsics?

Tags:

opencv

least-squares

extrinsic-parameters

opencv-solvepnp

zhangxaochen

1 Answers

BConic

Recent Activity

Donate For Us

Why Direct Linear Transformation (DLT) cannot give the optimal camera extrinsics?

Tags:

opencv

least-squares

extrinsic-parameters

opencv-solvepnp

zhangxaochen

1 Answers

BConic

Related questions

Recent Activity

Donate For Us