I know that in the general case, making this conversion is impossible since depth information is lost going from 3d to 2d. However, I have a fixed camera and I know its camera matrix. I also have a planar calibration pattern of known dimensions - let's say that in world coordinates it has corners (0,0,0) (2,0,0) (2,1,0) (0,1,0). Using opencv I can estimate the pattern's pose, giving the translation and rotation matrices needed to project a point on the object to a pixel in the image. Now: this 3d to image projection is easy, but how about the other way? If I pick a pixel in the image that I know is part of the calibration pattern, how can I get the corresponding 3d point? I could iteratively choose some random 3d point on the calibration pattern, project to 2d, and refine the 3d point based on the error. But this seems pretty horrible. Given that this unknown point has world coordinates something like (x,y,0) -- since it must lie on the z=0 plane -- it seems like there should be some transformation that I can apply, instead of doing the iterative nonsense. My maths isn't very good though - can someone work out this transformation and explain how you derive it?

Here is a closed form solution that I hope can help someone. Using the conventions in the image from your comment above, you can use centered-normalized pixel coordinates (usually after distortion correction) u and v, and extrinsic calibration data, like this: <pre class="prettyprint"><code>|Tx| |r11 r21 r31| |-t1| |Ty| = |r12 r22 r32|.|-t2| |Tz| |r13 r23 r33| |-t3| |dx| |r11 r21 r31| |u| |dy| = |r12 r22 r32|.|v| |dz| |r13 r23 r33| |1| </code></pre> With these intermediate values, the coordinates you want are: <pre class="prettyprint"><code>X = (-Tz/dz)*dx + Tx Y = (-Tz/dz)*dy + Ty </code></pre> Explanation: The vector [t1, t2, t3]t is the position of the origin of the world coordinate system (the (0,0) of your calibration pattern) with respect to the camera optical center; by reversing signs and inversing the rotation transformation we obtain vector T = [Tx, Ty, Tz]t, which is the position of the camera center in the world reference frame. Similarly, [u, v, 1]t is the vector in which lies the observed point in the camera reference frame (starting from camera center). By inversing the rotation transformation we obtain vector d = [dx, dy, dz]t, which represents the same direction in world reference frame. To inverse the rotation transformation we take advantage of the fact that the inverse of a rotation matrix is its transpose (link). Now we have a line with direction vector d starting from point T, the intersection of this line with plane Z=0 is given by the second set of equations. Note that it would be similarly easy to find the intersection with the X=0 or Y=0 planes or with any plane parallel to them.

Converting a 2D image point to a 3D world point

I know that in the general case, making this conversion is impossible since depth information is lost going from 3d to 2d.

However, I have a fixed camera and I know its camera matrix. I also have a planar calibration pattern of known dimensions - let's say that in world coordinates it has corners (0,0,0) (2,0,0) (2,1,0) (0,1,0). Using opencv I can estimate the pattern's pose, giving the translation and rotation matrices needed to project a point on the object to a pixel in the image.

Now: this 3d to image projection is easy, but how about the other way? If I pick a pixel in the image that I know is part of the calibration pattern, how can I get the corresponding 3d point?

I could iteratively choose some random 3d point on the calibration pattern, project to 2d, and refine the 3d point based on the error. But this seems pretty horrible.

Given that this unknown point has world coordinates something like (x,y,0) -- since it must lie on the z=0 plane -- it seems like there should be some transformation that I can apply, instead of doing the iterative nonsense. My maths isn't very good though - can someone work out this transformation and explain how you derive it?

How do you convert 2D coordinates to 3D coordinates?

To formulate a surface if co-ordinates are given in parametric form x(t),y(t) forming any base contour, then add z=cu, where c is chosen for the depth you want. (x(t),y(t),cu); In general conversion of 2D projection to 3D as you ask is indeterminate.

How to convert world coordinate system to camera coordinate system?

Coordinates of point in world space are defined with respect to the world Cartesian coordinate system. The space in which points are defined with respect to the camera coordinate system. To convert points from world to camera space, we need to multiply points in world space by the inverse of the camera-to-world matrix.

What is the camera coordinate system?

Camera view coordinate system This is the system that has its origin on the image plane and the Z -axis perpendicular to the image plane. In PyTorch3D, we assume that +X points left, and +Y points up and +Z points out from the image plane.

Here is a closed form solution that I hope can help someone. Using the conventions in the image from your comment above, you can use centered-normalized pixel coordinates (usually after distortion correction) u and v, and extrinsic calibration data, like this:

|Tx|   |r11 r21 r31| |-t1|
|Ty| = |r12 r22 r32|.|-t2|
|Tz|   |r13 r23 r33| |-t3|

|dx|   |r11 r21 r31| |u|
|dy| = |r12 r22 r32|.|v|
|dz|   |r13 r23 r33| |1|

With these intermediate values, the coordinates you want are:

X = (-Tz/dz)*dx + Tx
Y = (-Tz/dz)*dy + Ty

Explanation:

The vector [t1, t2, t3]^t is the position of the origin of the world coordinate system (the (0,0) of your calibration pattern) with respect to the camera optical center; by reversing signs and inversing the rotation transformation we obtain vector T = [Tx, Ty, Tz]^t, which is the position of the camera center in the world reference frame.

Similarly, [u, v, 1]^t is the vector in which lies the observed point in the camera reference frame (starting from camera center). By inversing the rotation transformation we obtain vector d = [dx, dy, dz]^t, which represents the same direction in world reference frame.

To inverse the rotation transformation we take advantage of the fact that the inverse of a rotation matrix is its transpose (link).

Now we have a line with direction vector d starting from point T, the intersection of this line with plane Z=0 is given by the second set of equations. Note that it would be similarly easy to find the intersection with the X=0 or Y=0 planes or with any plane parallel to them.

Yes, you can. If you have a transformation matrix that maps a point in the 3d world to the image plane, you can just use the inverse of this transformation matrix to map a image plane point to the 3d world point. If you already know that z = 0 for the 3d world point, this will result in one solution for the point. There will be no need to iteratively choose some random 3d point. I had a similar problem where I had a camera mounted on a vehicle with a known position and camera calibration matrix. I needed to know the real world location of a lane marking captured on the image place of the camera.

Converting a 2D image point to a 3D world point

Tags:

opencv

computer-vision

FusterCluck

People also ask

2 Answers

Milo

Diarmaid O Cualain

Recent Activity

Donate For Us

Converting a 2D image point to a 3D world point

Tags:

opencv

computer-vision

FusterCluck

People also ask

2 Answers

Milo

Diarmaid O Cualain

Related questions

Recent Activity

Donate For Us