Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

finding the mapping between video point and real world point

I am doing car tracking on a video. I am trying to determine how many meters it traveled.

I randomly pulled 7 points from a video frame. I made point1 as my origin

Then on the corresponding Google Maps perspective, I calcculated the distances of the 6 points from the orgin (delta x and delta y)

Then I ran the following

pts_src = np.array([[417, 285], [457, 794], [1383, 786], [1557, 423], [1132, 296], [759, 270], [694, 324]])

pts_dst = np.array([[0,0], [-3, -31], [30, -27], [34, 8], [17, 15], [8, 7], [6, 1]])

h, status = cv2.findHomography(pts_src, pts_dst)

a = np.array([[1032, 268]], dtype='float32')
a = np.array([a])

# finally, get the mapping
pointsOut = cv2.perspectiveTransform(a, h)

When I tested the mapping of point 7, the results are wrong.

Am I missing anything? Or am I using the wrong method? Thank you

Here is the image from the video enter image description here

I have marked the points and here is the mapping enter image description here

The x,y column represent the pixels on the image. The metered column represent the distance from the the origin to the point in meters. I basically, usging google maps, converted the geo code to UTM and calculated the x and the y difference.

I tried to input the 7th point and I got [[[14.682752 9.927497]]] as output which is quite far in the x axis.

Any idea if I am doing anything wrong?

like image 511
Snake Avatar asked May 12 '19 18:05

Snake


People also ask

How to fill real world distance with the real world input?

For example, as shown below, we need to select two points and give their real world distance filled in the Real World input. Here we select an original point and an X-axis point, and specify the actual distance as 20 mm.

How do I convert image points to World points?

worldPoints = pointsToWorld (intrinsics,rotationMatrix,translationVector,imagePoints) returns world points on the X - Y plane, which correspond to the input image points. Points are converted using the input rotation matrix, translation vector, and camera intrinsics.

How to transform the camera in world coordinates?

Transformation of the camera in world coordinates, specified as a rigid3d object. 3-D rotation of the world coordinates relative to the image coordinates, specified as a 3-by-3 matrix. The rotation matrix, together with the translation vector, enable you to transform points from the world coordinate system to the camera coordinate system.

How to find the distance of an object in an image?

The first step to finding the distance to an object or marker in an image is to calibrate and compute the focal length. To do this, we need to know: The distance of the camera from an object. The width (in units such as inches, meters, etc.) of this object. Note: The height could also be utilized, but this example simply uses the width.


1 Answers

Cameras are not ideal pinhole cameras and therefore the homography cannot capture the real transform.

For small angle cameras the result are quite close, but for a fish-eye camera the result can be very off.

Also, in my experience, just the theoretical lens distortion model found in literature is not very accurate with real-world lenses (multi-element that do "strange" things to compensate for barrel/cushion distortion). Today is also viable the use of non-spherical lenses where the transformation can be just anything.

To be able to get accurate results the only solution I found was actually mapping the transformation function using an interpolating spline function.

EDIT

In your case I'd say the problem is in the input data: considering the quasi-quadrilateral formed by the points 6, 3, 1, 2

image with marked quasi-quadrilateral

If the A-D distance in meters is 36.9, how can B-C distance be 53.8 meters?

May be the problem is in how you collected the data, or that google maps shouldn't be considered reliable for such small measurements.

A solution could be just measuring the relative distances of the points and then finding their coordinates on the plane solving from that distance matrix.

EDIT

To check I wrote a simple non-linear least squares solver (works by stochastic hill climbing) using a picture of my floor to test it. After a few seconds (it's written in Python, so speed it's not its best feature) can solve a general pinpoint planar camera equation:

 pixel_x = (world_x*m11 + world_y*m12 + m13) / w
 pixel_y = (world_x*m21 + world_y*m22 + m23) / w
 w = (x*m31 + y*m32 + m33)

 m11**2 + m12**2 + m13**2 = 1

and I can get a camera with less that 4 pixel maximum error (on a 4k image).

enter image description here

With YOUR data however I cannot get an error smaller than 120 pixels. The best matrix I found for your data is:

0.0704790534896005     -0.0066904288370295524   0.9974908226049937
0.013902632209214609   -0.03214426521221147     0.6680756144949469
6.142954035443663e-06  -7.361135651590592e-06   0.002007213927080277

Solving your data using only points 1, 2, 3 and 6 I get of course an exact numeric solution (with four general points there is one exact planar camera) but the image is clearly completely wrong (the grid should lie on the street plane):

enter image description here

like image 168
6502 Avatar answered Nov 02 '22 07:11

6502