Given a 2d picture of a rectangle distorted by perspective:
I know that the shape was originally a rectangle, but I do not know its original size.
If I know the pixel coordinates of the corners in this picture, how can I calculate the original proportions, i.e. the quotient ( width / height ) of the rectangle?
(background: the goal is to automatically undistort photos of rectangular documents, edge detection will probably be done with hough transform)
There has been some discussion on whether it is possible at all to determine the width:height ratio with the information given. My naive thought was that it must be possible, since I can think of no way to project for example a 1:4 rectangle onto the quadrangle depicted above. The ratio appears clearly close to 1:1, so there should be a way to determine it mathematically. I have however no proof for this beyond my intuitive guess.
I have not yet fully understood the arguments presented below, but I think there must be some implicit assumption that we are missing here and that is interpreted differently.
However, after hours of searching, I have finally found some papers relevant to the problem. I am struggling to understand the math used in there, so far without success. Particularly the first paper seems to discuss exactly what I wanted to do, unfortunately without code examples and very dense math.
Zhengyou Zhang , Li-Wei He, "Whiteboard scanning and image enhancement" http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf p.11
"Because of the perspective distortion, the image of a rectangle appears to be a quadrangle. However, since we know that it is a rectangle in space, we are able to estimate both the camera’s focal length and the rectangle’s aspect ratio."
ROBERT M. HARALICK "Determining camera parameters from the perspective projection of a rectangle" http://portal.acm.org/citation.cfm?id=87146
"we show how to use the 2D perspective projection of a rectangle of unknown size and position in 3D space to determine the camera look angle parameters relative to the plans of the rectangle."
Here is my attempt at answering my question after reading the paper
I manipulated the equations for some time in SAGE, and came up with this pseudo-code in c-style:
// in case it matters: licensed under GPLv2 or later // legend: // sqr(x) = x*x // sqrt(x) = square root of x // let m1x,m1y ... m4x,m4y be the (x,y) pixel coordinates // of the 4 corners of the detected quadrangle // i.e. (m1x, m1y) are the cordinates of the first corner, // (m2x, m2y) of the second corner and so on. // let u0, v0 be the pixel coordinates of the principal point of the image // for a normal camera this will be the center of the image, // i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2 // This assumption does not hold if the image has been cropped asymmetrically // first, transform the image so the principal point is at (0,0) // this makes the following equations much easier m1x = m1x - u0; m1y = m1y - v0; m2x = m2x - u0; m2y = m2y - v0; m3x = m3x - u0; m3y = m3y - v0; m4x = m4x - u0; m4y = m4y - v0; // temporary variables k2, k3 double k2 = ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x) / ((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) ; double k3 = ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x) / ((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) ; // f_squared is the focal length of the camera, squared // if k2==1 OR k3==1 then this equation is not solvable // if the focal length is known, then this equation is not needed // in that case assign f_squared= sqr(focal_length) double f_squared = -((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x - m1x)) / ((k3 - 1)*(k2 - 1)) ; //The width/height ratio of the original rectangle double whRatio = sqrt( (sqr(k2 - 1) + sqr(k2*m2y - m1y)/f_squared + sqr(k2*m2x - m1x)/f_squared) / (sqr(k3 - 1) + sqr(k3*m3y - m1y)/f_squared + sqr(k3*m3x - m1x)/f_squared) ) ; // if k2==1 AND k3==1, then the focal length equation is not solvable // but the focal length is not needed to calculate the ratio. // I am still trying to figure out under which circumstances k2 and k3 become 1 // but it seems to be when the rectangle is not distorted by perspective, // i.e. viewed straight on. Then the equation is obvious: if (k2==1 && k3==1) whRatio = sqrt( (sqr(m2y-m1y) + sqr(m2x-m1x)) / (sqr(m3y-m1y) + sqr(m3x-m1x)) // After testing, I found that the above equations // actually give the height/width ratio of the rectangle, // not the width/height ratio. // If someone can find the error that caused this, // I would be most grateful. // until then: whRatio = 1/whRatio;
The following is code in SAGE. It can be accessed online at http://www.sagenb.org/home/pub/704/. (Sage is really useful in solving equations, and useable in any browser, check it out)
# CALCULATING THE ASPECT RATIO OF A RECTANGLE DISTORTED BY PERSPECTIVE # # BIBLIOGRAPHY: # [zhang-single]: "Single-View Geometry of A Rectangle # With Application to Whiteboard Image Rectification" # by Zhenggyou Zhang # http://research.microsoft.com/users/zhang/Papers/WhiteboardRectification.pdf # pixel coordinates of the 4 corners of the quadrangle (m1, m2, m3, m4) # see [zhang-single] figure 1 m1x = var('m1x') m1y = var('m1y') m2x = var('m2x') m2y = var('m2y') m3x = var('m3x') m3y = var('m3y') m4x = var('m4x') m4y = var('m4y') # pixel coordinates of the principal point of the image # for a normal camera this will be the center of the image, # i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2 # This assumption does not hold if the image has been cropped asymmetrically u0 = var('u0') v0 = var('v0') # pixel aspect ratio; for a normal camera pixels are square, so s=1 s = var('s') # homogenous coordinates of the quadrangle m1 = vector ([m1x,m1y,1]) m2 = vector ([m2x,m2y,1]) m3 = vector ([m3x,m3y,1]) m4 = vector ([m4x,m4y,1]) # the following equations are later used in calculating the the focal length # and the rectangle's aspect ratio. # temporary variables: k2, k3, n2, n3 # see [zhang-single] Equation 11, 12 k2_ = m1.cross_product(m4).dot_product(m3) / m2.cross_product(m4).dot_product(m3) k3_ = m1.cross_product(m4).dot_product(m2) / m3.cross_product(m4).dot_product(m2) k2 = var('k2') k3 = var('k3') # see [zhang-single] Equation 14,16 n2 = k2 * m2 - m1 n3 = k3 * m3 - m1 # the focal length of the camera. f = var('f') # see [zhang-single] Equation 21 f_ = sqrt( -1 / ( n2[2]*n3[2]*s^2 ) * ( ( n2[0]*n3[0] - (n2[0]*n3[2]+n2[2]*n3[0])*u0 + n2[2]*n3[2]*u0^2 )*s^2 + ( n2[1]*n3[1] - (n2[1]*n3[2]+n2[2]*n3[1])*v0 + n2[2]*n3[2]*v0^2 ) ) ) # standard pinhole camera matrix # see [zhang-single] Equation 1 A = matrix([[f,0,u0],[0,s*f,v0],[0,0,1]]) #the width/height ratio of the original rectangle # see [zhang-single] Equation 20 whRatio = sqrt ( (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / (n3*A.transpose()^(-1) * A^(-1)*n3.transpose()) )
The simplified equations in the c-code where determined by
print "simplified equations, assuming u0=0, v0=0, s=1" print "k2 := ", k2_ print "k3 := ", k3_ print "f := ", f_(u0=0,v0=0,s=1) print "whRatio := ", whRatio(u0=0,v0=0,s=1) simplified equations, assuming u0=0, v0=0, s=1 k2 := ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) k3 := ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) f := sqrt(-((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x - m1x))/((k3 - 1)*(k2 - 1))) whRatio := sqrt(((k2 - 1)^2 + (k2*m2y - m1y)^2/f^2 + (k2*m2x - m1x)^2/f^2)/((k3 - 1)^2 + (k3*m3y - m1y)^2/f^2 + (k3*m3x - m1x)^2/f^2)) print "Everything in one equation:" print "whRatio := ", whRatio(f=f_)(k2=k2_,k3=k3_)(u0=0,v0=0,s=1) Everything in one equation: whRatio := sqrt(((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - 1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y)^2/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - 1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)^2/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)) - (((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)^2)/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - 1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)^2/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - 1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)^2/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)) - (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - 1)^2))
# some testing: # - choose a random rectangle, # - project it onto a random plane, # - insert the corners in the above equations, # - check if the aspect ratio is correct. from sage.plot.plot3d.transform import rotate_arbitrary #redundandly random rotation matrix rand_rotMatrix = \ rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\ rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\ rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) #random translation vector rand_transVector = vector((uniform(-10,10),uniform(-10,10),uniform(-10,10))).transpose() #random rectangle parameters rand_width =uniform(0.1,10) rand_height=uniform(0.1,10) rand_left =uniform(-10,10) rand_top =uniform(-10,10) #random focal length and principal point rand_f = uniform(0.1,100) rand_u0 = uniform(-100,100) rand_v0 = uniform(-100,100) # homogenous standard pinhole projection, see [zhang-single] Equation 1 hom_projection = A * rand_rotMatrix.augment(rand_transVector) # construct a random rectangle in the plane z=0, then project it randomly rand_m1hom = hom_projection*vector((rand_left ,rand_top ,0,1)).transpose() rand_m2hom = hom_projection*vector((rand_left ,rand_top+rand_height,0,1)).transpose() rand_m3hom = hom_projection*vector((rand_left+rand_width,rand_top ,0,1)).transpose() rand_m4hom = hom_projection*vector((rand_left+rand_width,rand_top+rand_height,0,1)).transpose() #change type from 1x3 matrix to vector rand_m1hom = rand_m1hom.column(0) rand_m2hom = rand_m2hom.column(0) rand_m3hom = rand_m3hom.column(0) rand_m4hom = rand_m4hom.column(0) #normalize rand_m1hom = rand_m1hom/rand_m1hom[2] rand_m2hom = rand_m2hom/rand_m2hom[2] rand_m3hom = rand_m3hom/rand_m3hom[2] rand_m4hom = rand_m4hom/rand_m4hom[2] #substitute random values for f, u0, v0 rand_m1hom = rand_m1hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0) rand_m2hom = rand_m2hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0) rand_m3hom = rand_m3hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0) rand_m4hom = rand_m4hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0) # printing the randomly choosen values print "ground truth: f=", rand_f, "; ratio=", rand_width/rand_height # substitute all the variables in the equations: print "calculated: f= ",\ f_(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)( m1x=rand_m1hom[0],m1y=rand_m1hom[1], m2x=rand_m2hom[0],m2y=rand_m2hom[1], m3x=rand_m3hom[0],m3y=rand_m3hom[1], m4x=rand_m4hom[0],m4y=rand_m4hom[1], ),"; 1/ratio=", \ 1/whRatio(f=f_)(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)( m1x=rand_m1hom[0],m1y=rand_m1hom[1], m2x=rand_m2hom[0],m2y=rand_m2hom[1], m3x=rand_m3hom[0],m3y=rand_m3hom[1], m4x=rand_m4hom[0],m4y=rand_m4hom[1], ) print "k2 = ", k2_( m1x=rand_m1hom[0],m1y=rand_m1hom[1], m2x=rand_m2hom[0],m2y=rand_m2hom[1], m3x=rand_m3hom[0],m3y=rand_m3hom[1], m4x=rand_m4hom[0],m4y=rand_m4hom[1], ), "; k3 = ", k3_( m1x=rand_m1hom[0],m1y=rand_m1hom[1], m2x=rand_m2hom[0],m2y=rand_m2hom[1], m3x=rand_m3hom[0],m3y=rand_m3hom[1], m4x=rand_m4hom[0],m4y=rand_m4hom[1], ) # ATTENTION: testing revealed, that the whRatio # is actually the height/width ratio, # not the width/height ratio # This contradicts [zhang-single] # if anyone can find the error that caused this, I'd be grateful ground truth: f= 72.1045134124554 ; ratio= 3.46538779959142 calculated: f= 72.1045134125 ; 1/ratio= 3.46538779959 k2 = 0.99114614987 ; k3 = 1.57376280159
Update
After reading your update, and looking at the first reference (Whiteboard scanning and image enhancement), I see where the missing point is.
The input data of the problem is a quadruple (A,B,C,D), AND the center O of the projected image. In the article, it corresponds to the assumption u0=v0=0. Adding this point, the problem becomes constrained enough to get the aspect ratio of the rectangle.
The problem is then restated as follows: Given a quadruple (A,B,C,D) in the Z=0 plane, find the eye position E(0,0,h), h>0 and a 3D plane P such that the projection of (A,B,C,D) on P is a rectangle.
Note that P is determined by E: to get a parallelogram, P must contain parallels to (EU) and (EV), where U=(AB)x(CD) and V=(AD)x(BC).
Experimentally, it seems that this problem has in general one unique solution, corresponding to a unique value of the w/h ratio of the rectangle.
Previous Post
No, you can't determine the rectangle ratio from the projection.
In the general case, a quadruple (A,B,C,D) of four non collinear points of the Z=0 plane is the projection of infinitely many rectangles, with infinitely many width/height ratios.
Consider the two vanishing points U, intersection of (AB) and (CD) and V, intersection of (AD) and (BC), and the point I, intersection of the two diagonals (AC) and (BD). To project as ABCD, a parallelogram of center I must lie on a plane containing the line parallel to (UV) through point I. On one such plane, you can find many rectangles projecting to ABCD, all with a different w/h ratio.
See these two images done with Cabri 3D. In the two cases ABCD is unchanged (on the gray Z=0 plane), and the blue plane containing the rectangle is not changed either. The green line partially hidden is the (UV) line and the visible green line is parallel to it and contains I.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With