I am working on an application where I need to rectify an image taken from a mobile camera platform. The platform measures roll, pitch and yaw angles, and I want to make it look like the image is taken from directly above, by some sort of transform from this information.
In other words, I want a perfect square lying flat on the ground, photographed from afar with some camera orientation, to be transformed, so that the square is perfectly symmetrical afterwards.
I have been trying to do this through OpenCV(C++) and Matlab, but I seem to be missing something fundamental about how this is done.
In Matlab, I have tried the following:
%% Transform perspective
img = imread('my_favourite_image.jpg');
R = R_z(yaw_angle)*R_y(pitch_angle)*R_x(roll_angle);
tform = projective2d(R);
outputImage = imwarp(img,tform);
figure(1), imshow(outputImage);
Where R_z/y/x are the standard rotational matrices (implemented with degrees).
For some yaw-rotation, it all works just fine:
R = R_z(10)*R_y(0)*R_x(0);
Which gives the result:
If I try to rotate the image by the same amount about the X- or Y- axes, I get results like this:
R = R_z(10)*R_y(0)*R_x(10);
However, if I rotate by 10 degrees, divided by some huge number, it starts to look OK. But then again, this is a result that has no research value what so ever:
R = R_z(10)*R_y(0)*R_x(10/1000);
Can someone please help me understand why rotating about the X- or Y-axes makes the transformation go wild? Is there any way of solving this without dividing by some random number and other magic tricks? Is this maybe something that can be solved using Euler parameters of some sort? Any help will be highly appreciated!
Update: Full setup and measurements
For completeness, the full test code and initial image has been added, as well as the platforms Euler angles:
Code:
%% Transform perspective
function [] = main()
img = imread('some_image.jpg');
R = R_z(0)*R_y(0)*R_x(10);
tform = projective2d(R);
outputImage = imwarp(img,tform);
figure(1), imshow(outputImage);
end
%% Matrix for Yaw-rotation about the Z-axis
function [R] = R_z(psi)
R = [cosd(psi) -sind(psi) 0;
sind(psi) cosd(psi) 0;
0 0 1];
end
%% Matrix for Pitch-rotation about the Y-axis
function [R] = R_y(theta)
R = [cosd(theta) 0 sind(theta);
0 1 0 ;
-sind(theta) 0 cosd(theta) ];
end
%% Matrix for Roll-rotation about the X-axis
function [R] = R_x(phi)
R = [1 0 0;
0 cosd(phi) -sind(phi);
0 sind(phi) cosd(phi)];
end
The initial image:
Camera platform measurements in the BODY coordinate frame:
Roll: -10
Pitch: -30
Yaw: 166 (angular deviation from north)
From what I understand the Yaw-angle is not directly relevant to the transformation. I might, however, be wrong about this.
Additional info:
I would like specify that the environment in which the setup will be used contains no lines (oceanic photo) that can reliably used as a reference (the horizon will usually not be in the picture). Also the square in the initial image is merely used as a measure to see if the transformation is correct, and will not be there in a real scenario.
What are Roll, Pitch, and Yaw? Imagine three lines running through an airplane and intersecting at right angles at the airplane's center of gravity. Rotation around the front-to-back axis is called roll. Rotation around the side-to-side axis is called pitch. Rotation around the vertical axis is called yaw.
pitch = atan2( -r20, sqrt(r21*r21+r22*r22) ); yaw = atan2( r10, r00 ); roll = atan2( r21, r22 );
Tilting a camera means rotating the camera up or down; panning a camera means rotating the camera left or right. The scene in the camera's view moves in the opposite direction. The angle of rotation up or down is also referred to as pitch; the angle of rotation left or right is also referred to as yaw.
Pitch is a counterclockwise rotation of m about the y-axis. Roll is a counterclockwise rotation of s about the x-axis. Source publication. A strategy for heterogeneous multi-robot self-localization system.
So, this is what I ended up doing: I figured that unless you are actually dealing with 3D images, rectifying the perspective of a photo is a 2D operation. With this in mind, I replaced the z-axis values of the transformation matrix with zeros and ones, and applied a 2D Affine transformation to the image.
Rotation of the initial image (see initial post) with measured Roll = -10 and Pitch = -30 was done in the following manner:
R_rotation = R_y(-60)*R_x(10);
R_2d = [ R_rot(1,1) R_rot(1,2) 0;
R_rot(2,1) R_rot(2,2) 0;
0 0 1 ]
This implies a rotation of the camera platform to a virtual camera orientation where the camera is placed above the scene, pointing straight downwards. Note the values used for roll and pitch in the matrix above.
Additionally, if rotating the image so that is aligned with the platform heading, a rotation about the z-axis might be added, giving:
R_rotation = R_y(-60)*R_x(10)*R_z(some_heading);
R_2d = [ R_rot(1,1) R_rot(1,2) 0;
R_rot(2,1) R_rot(2,2) 0;
0 0 1 ]
Note that this does not change the actual image - it only rotates it.
As a result, the initial image rotated about the Y- and X-axes looks like:
The full code for doing this transformation, as displayed above, was:
% Load image
img = imread('initial_image.jpg');
% Full rotation matrix. Z-axis included, but not used.
R_rot = R_y(-60)*R_x(10)*R_z(0);
% Strip the values related to the Z-axis from R_rot
R_2d = [ R_rot(1,1) R_rot(1,2) 0;
R_rot(2,1) R_rot(2,2) 0;
0 0 1 ];
% Generate transformation matrix, and warp (matlab syntax)
tform = affine2d(R_2d);
outputImage = imwarp(img,tform);
% Display image
figure(1), imshow(outputImage);
%*** Rotation Matrix Functions ***%
%% Matrix for Yaw-rotation about the Z-axis
function [R] = R_z(psi)
R = [cosd(psi) -sind(psi) 0;
sind(psi) cosd(psi) 0;
0 0 1];
end
%% Matrix for Pitch-rotation about the Y-axis
function [R] = R_y(theta)
R = [cosd(theta) 0 sind(theta);
0 1 0 ;
-sind(theta) 0 cosd(theta) ];
end
%% Matrix for Roll-rotation about the X-axis
function [R] = R_x(phi)
R = [1 0 0;
0 cosd(phi) -sind(phi);
0 sind(phi) cosd(phi)];
end
Thank you for the support, I hope this helps someone!
I think you can derive transformation this way:
1) Let you have four 3d-points A(-1,-1,0), B(1,-1,0), C(1,1,0) and D(-1,1,0). You can take any 4 noncollinear points. They not related to image.
2) You have transformation matrix, so you can set your camera by multiplying points coords by transformation matrix. And you'll get 3d coords relative to camera position/direction.
3) You need to get projection of your points to screen plane. The simpliest way is to use ortographic projection (simply ignore depth coordinate). On this stage you've got 2D projections of transformed points.
4) Once you have 2 sets of 2D points coordinates (the set from step 1 without 3-rd coordinate and the set from step 3), you can compute homography matrix in standard way.
5) Apply inverse homograhy transformation to your image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With