To calculate world coordinates from screen coordinates with OpenCV

1 Answers

First to understand how you calculate it, it would help you if you read some things about the pinhole camera model and simple perspective projection. For a quick glimpse, check this. I'll try to update with more.

So, let's start by the opposite which describes how a camera works: project a 3d point in the world coordinate system to a 2d point in our image. According to the camera model:

P_screen = I * P_world

or (using homogeneous coordinates)

| x_screen | = I * | x_world |
| y_screen |       | y_world |
|    1     |       | z_world |
                   |    1    |

where

I = | f_x    0    c_x    0 | 
    |  0    f_y   c_y    0 |
    |  0     0     1     0 |

is the 3x4 intrinsics matrix, f being the focal point and c the center of projection.

If you solve the system above, you get:

x_screen = (x_world/z_world)*f_x + c_x
y_screen = (y_world/z_world)*f_y + c_y

But, you want to do the reverse, so your answer is:

x_world = (x_screen - c_x) * z_world / f_x
y_world = (y_screen - c_y) * z_world / f_y

z_world is the depth the Kinect returns to you and you know f and c from your intrinsics calibration, so for every pixel, you apply the above to get the actual world coordinates.

Edit 1 (why the above correspond to world coordinates and what are the extrinsics we get during calibration):

First, check this one, it explains the various coordinates systems very well.

Your 3d coordinate systems are: Object ---> World ---> Camera. There is a transformation that takes you from object coordinate system to world and another one that takes you from world to camera (the extrinsics you refer to). Usually you assume that:

Either the Object system corresponds with the World system,
or, the Camera system corresponds with the World system

1. While capturing an object with the Kinect

When you use the Kinect to capture an object, what is returned to you from the sensor is the distance from the camera. That means that the z coordinate is already in camera coordinates. By converting x and y using the equations above, you get the point in camera coordinates.

Now, the world coordinate system is defined by you. One common approach is to assume that the camera is located at (0,0,0) of the world coordinate system. So, in that case, the extrinsics matrix actually corresponds to the identity matrix and the camera coordinates you found, correspond to world coordinates.

Sidenote: Because the Kinect returns the z in camera coordinates, there is also no need from transformation from the object coordinate system to the world coordinate system. Let's say for example that you had a different camera that captured faces and for each point it returned the distance from the nose (which you considered to be the center of the object coordinate system). In that case, since the values returned would be in the object coordinate system, we would indeed need a rotation and translation matrix to bring them to the camera coordinate system.

2. While calibrating the camera

I suppose you are calibrating the camera using OpenCV using a calibration board with various poses. The usual way is to assume that the board is actually stable and the camera is moving instead of the opposite (the transformation is the same in both cases). That means that now the world coordinate system corresponds to the object coordinate system. This way, for every frame, we find the checkerboard corners and assign them 3d coordinates, doing something like:

std::vector<cv::Point3f> objectCorners;

for (int i=0; i<noOfCornersInHeight; i++) 
{
    for (int j=0; j<noOfCornersInWidth; j++) 
    {
        objectCorners.push_back(cv::Point3f(float(i*squareSize),float(j*squareSize), 0.0f));
    }
}

where noOfCornersInWidth, noOfCornersInHeight and squareSize depend on your calibration board. If for example noOfCornersInWidth = 4, noOfCornersInHeight = 3 and squareSize = 100, we get the 3d points

(0  ,0,0)  (0  ,100,0)  (0  ,200,0)    (0  ,300,0)
(100,0,0)  (100,100,0)  (100,200,0)    (100,300,0)
(200,0,0)  (200,100,0)  (200,200,0)    (200,300,0)

So, here our coordinates are actually in the object coordinate system. (We have assumed arbitrarily that the upper left corner of the board is (0,0,0) and the rest corners' coordinates are according to that one). So here we indeed need the rotation and transformation matrix to take us from the object(world) to the camera system. These are the extrinsics that OpenCV returns for each frame.

To sum up in the Kinect case:

Camera and World coodinate systems are considered the same, so no need for extrinsics there.
No need for Object to World(Camera) transformation, since Kinect return value is already in Camera system.

Edit 2 (On the coordinate system used):

This is a convention and I think it depends also on which drivers you use and the kind of data you get back. Check for example that, that and that one.

Sidenote: It would help you a lot if you visualized a point cloud and played a little bit with it. You can save your points in a 3d object format (e.g. ply or obj) and then just import it into a program like Meshlab (very easy to use).

111

answered Oct 28 '22 06:10

Sassa

Related questions
                            
                                Why can't use `cv2.cv.BoxPoints` in OpenCV (Python)?
                            
                                OpenCV DescriptorMatcher radiusMatch and knnMatch result format
                            
                                How to get x,y coordinates of extracted objects in javacv?
                            
                                OpenCV C++: how access pixel value CV_32F through uchar data pointer
                            
                                Determine if an image exists within a larger image, and if so, find it, using Python
                            
                                Installing OpenCV with Brew never finishes
                            
                                how to obtain a single channel value image from HSV image in opencv 2.1?
                            
                                Install OpenCV for Python (multiple python versions)
                            
                                AttributeError: 'module' object has no attribute
                            
                                Fill the outside of contours OpenCV
                            
                                How to save mouse position in variable using OpenCV and Python?
                            
                                Convert two points to a rectangle (cv::Rect)
                            
                                Q matrix for the reprojectImageTo3D function in opencv
                            
                                'CV_LOAD_IMAGE_GRAYSCALE' is not defined{PY}
                            
                                How to make a simple window with one button using OpenCV HighGui only?
                            
                                Extracting the dimensions of a rectangle
                            
                                How to convert GpuMat to CvMat in OpenCV?
                            
                                python-opencv AttributeError: 'module' object has no attribute 'createBackgroundSubtractorGMG'
                            
                                Black color object detection HSV range in opencv
                            
                                how to get opencv_contrib module in anaconda

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

To calculate world coordinates from screen coordinates with OpenCV

Tags:

opencv

kinect

calibration

camera-calibration

Paul

People also ask

1 Answers

Sassa

Recent Activity

Donate For Us