I'm new in this field and I'm trying to model a simple scene in 3d out of 2d images and I dont have any info about cameras. I know that there are 3 options:
I have two images and I know the model of my camera (intrisics) that I loaded from a XML for instance loadXMLFromFile()
=> stereoRectify()
=> reprojectImageTo3D()
I don't have them but I can calibrate my camera => stereoCalibrate()
=> stereoRectify()
=> reprojectImageTo3D()
I can't calibrate the camera (it is my case, because I don't have the camera that has taken the 2 images, then I need to find pair keypoints on both images with SURF, SIFT for instance (I can use any blob detector actually), then compute descriptors of these keypoints, then match keypoints from image right and image left according to their descriptors, and then find the fundamental matrix from them. The processing is much harder and would be like this:
findFundamentalMat()
) from these pairs => stereoRectifyUncalibrated()
=> reprojectImageTo3D()
I'm using the last approach and my questions are:
1) Is it right?
2) if it's ok, I have a doubt about the last step stereoRectifyUncalibrated()
=> reprojectImageTo3D()
. The signature of reprojectImageTo3D()
function is:
void reprojectImageTo3D(InputArray disparity, OutputArray _3dImage, InputArray Q, bool handleMissingValues=false, int depth=-1 ) cv::reprojectImageTo3D(imgDisparity8U, xyz, Q, true) (in my code)
Parameters:
disparity
– Input single-channel 8-bit unsigned, 16-bit signed, 32-bit signed or 32-bit floating-point disparity image._3dImage
– Output 3-channel floating-point image of the same size as disparity
. Each element of _3dImage(x,y)
contains 3D coordinates of the point (x,y)
computed from the disparity map.Q
– 4x4 perspective transformation matrix that can be obtained with stereoRectify()
.handleMissingValues
– Indicates, whether the function should handle missing values (i.e. points where the disparity was not computed). If handleMissingValues=true
, then pixels with the minimal disparity that corresponds to the outliers (see StereoBM::operator()
) are transformed to 3D points with a very large Z value (currently set to 10000).ddepth
– The optional output array depth. If it is -1, the output image will have CV_32F
depth. ddepth
can also be set to CV_16S
, CV_32S
or `CV_32F'.How can I get the Q
matrix? Is possible to obtain the Q
matrix with F
, H1
and H2
or in another way?
3) Is there another way for obtain the xyz coordinates without calibrating the cameras?
My code is:
#include <opencv2/core/core.hpp> #include <opencv2/calib3d/calib3d.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/contrib/contrib.hpp> #include <opencv2/features2d/features2d.hpp> #include <stdio.h> #include <iostream> #include <vector> #include <conio.h> #include <opencv/cv.h> #include <opencv/cxcore.h> #include <opencv/cvaux.h> using namespace cv; using namespace std; int main(int argc, char *argv[]){ // Read the images Mat imgLeft = imread( argv[1], CV_LOAD_IMAGE_GRAYSCALE ); Mat imgRight = imread( argv[2], CV_LOAD_IMAGE_GRAYSCALE ); // check if (!imgLeft.data || !imgRight.data) return 0; // 1] find pair keypoints on both images (SURF, SIFT)::::::::::::::::::::::::::::: // vector of keypoints std::vector<cv::KeyPoint> keypointsLeft; std::vector<cv::KeyPoint> keypointsRight; // Construct the SURF feature detector object cv::SiftFeatureDetector sift( 0.01, // feature threshold 10); // threshold to reduce // sensitivity to lines // Detect the SURF features // Detection of the SIFT features sift.detect(imgLeft,keypointsLeft); sift.detect(imgRight,keypointsRight); std::cout << "Number of SURF points (1): " << keypointsLeft.size() << std::endl; std::cout << "Number of SURF points (2): " << keypointsRight.size() << std::endl; // 2] compute descriptors of these keypoints (SURF,SIFT) :::::::::::::::::::::::::: // Construction of the SURF descriptor extractor cv::SurfDescriptorExtractor surfDesc; // Extraction of the SURF descriptors cv::Mat descriptorsLeft, descriptorsRight; surfDesc.compute(imgLeft,keypointsLeft,descriptorsLeft); surfDesc.compute(imgRight,keypointsRight,descriptorsRight); std::cout << "descriptor matrix size: " << descriptorsLeft.rows << " by " << descriptorsLeft.cols << std::endl; // 3] matching keypoints from image right and image left according to their descriptors (BruteForce, Flann based approaches) // Construction of the matcher cv::BruteForceMatcher<cv::L2<float> > matcher; // Match the two image descriptors std::vector<cv::DMatch> matches; matcher.match(descriptorsLeft,descriptorsRight, matches); std::cout << "Number of matched points: " << matches.size() << std::endl; // 4] find the fundamental mat :::::::::::::::::::::::::::::::::::::::::::::::::::: // Convert 1 vector of keypoints into // 2 vectors of Point2f for compute F matrix // with cv::findFundamentalMat() function std::vector<int> pointIndexesLeft; std::vector<int> pointIndexesRight; for (std::vector<cv::DMatch>::const_iterator it= matches.begin(); it!= matches.end(); ++it) { // Get the indexes of the selected matched keypoints pointIndexesLeft.push_back(it->queryIdx); pointIndexesRight.push_back(it->trainIdx); } // Convert keypoints into Point2f std::vector<cv::Point2f> selPointsLeft, selPointsRight; cv::KeyPoint::convert(keypointsLeft,selPointsLeft,pointIndexesLeft); cv::KeyPoint::convert(keypointsRight,selPointsRight,pointIndexesRight); /* check by drawing the points std::vector<cv::Point2f>::const_iterator it= selPointsLeft.begin(); while (it!=selPointsLeft.end()) { // draw a circle at each corner location cv::circle(imgLeft,*it,3,cv::Scalar(255,255,255),2); ++it; } it= selPointsRight.begin(); while (it!=selPointsRight.end()) { // draw a circle at each corner location cv::circle(imgRight,*it,3,cv::Scalar(255,255,255),2); ++it; } */ // Compute F matrix from n>=8 matches cv::Mat fundemental= cv::findFundamentalMat( cv::Mat(selPointsLeft), // points in first image cv::Mat(selPointsRight), // points in second image CV_FM_RANSAC); // 8-point method std::cout << "F-Matrix size= " << fundemental.rows << "," << fundemental.cols << std::endl; /* draw the left points corresponding epipolar lines in right image std::vector<cv::Vec3f> linesLeft; cv::computeCorrespondEpilines( cv::Mat(selPointsLeft), // image points 1, // in image 1 (can also be 2) fundemental, // F matrix linesLeft); // vector of epipolar lines // for all epipolar lines for (vector<cv::Vec3f>::const_iterator it= linesLeft.begin(); it!=linesLeft.end(); ++it) { // draw the epipolar line between first and last column cv::line(imgRight,cv::Point(0,-(*it)[2]/(*it)[1]),cv::Point(imgRight.cols,-((*it)[2]+(*it)[0]*imgRight.cols)/(*it)[1]),cv::Scalar(255,255,255)); } // draw the left points corresponding epipolar lines in left image std::vector<cv::Vec3f> linesRight; cv::computeCorrespondEpilines(cv::Mat(selPointsRight),2,fundemental,linesRight); for (vector<cv::Vec3f>::const_iterator it= linesRight.begin(); it!=linesRight.end(); ++it) { // draw the epipolar line between first and last column cv::line(imgLeft,cv::Point(0,-(*it)[2]/(*it)[1]), cv::Point(imgLeft.cols,-((*it)[2]+(*it)[0]*imgLeft.cols)/(*it)[1]), cv::Scalar(255,255,255)); } // Display the images with points and epipolar lines cv::namedWindow("Right Image Epilines"); cv::imshow("Right Image Epilines",imgRight); cv::namedWindow("Left Image Epilines"); cv::imshow("Left Image Epilines",imgLeft); */ // 5] stereoRectifyUncalibrated():::::::::::::::::::::::::::::::::::::::::::::::::: //H1, H2 – The output rectification homography matrices for the first and for the second images. cv::Mat H1(4,4, imgRight.type()); cv::Mat H2(4,4, imgRight.type()); cv::stereoRectifyUncalibrated(selPointsRight, selPointsLeft, fundemental, imgRight.size(), H1, H2); // create the image in which we will save our disparities Mat imgDisparity16S = Mat( imgLeft.rows, imgLeft.cols, CV_16S ); Mat imgDisparity8U = Mat( imgLeft.rows, imgLeft.cols, CV_8UC1 ); // Call the constructor for StereoBM int ndisparities = 16*5; // < Range of disparity > int SADWindowSize = 5; // < Size of the block window > Must be odd. Is the // size of averaging window used to match pixel // blocks(larger values mean better robustness to // noise, but yield blurry disparity maps) StereoBM sbm( StereoBM::BASIC_PRESET, ndisparities, SADWindowSize ); // Calculate the disparity image sbm( imgLeft, imgRight, imgDisparity16S, CV_16S ); // Check its extreme values double minVal; double maxVal; minMaxLoc( imgDisparity16S, &minVal, &maxVal ); printf("Min disp: %f Max value: %f \n", minVal, maxVal); // Display it as a CV_8UC1 image imgDisparity16S.convertTo( imgDisparity8U, CV_8UC1, 255/(maxVal - minVal)); namedWindow( "windowDisparity", CV_WINDOW_NORMAL ); imshow( "windowDisparity", imgDisparity8U ); // 6] reprojectImageTo3D() ::::::::::::::::::::::::::::::::::::::::::::::::::::: //Mat xyz; //cv::reprojectImageTo3D(imgDisparity8U, xyz, Q, true); //How can I get the Q matrix? Is possibile to obtain the Q matrix with //F, H1 and H2 or in another way? //Is there another way for obtain the xyz coordinates? cv::waitKey(); return 0; }
The goal of image-based 3D reconstruction is to infer the 3D geometry and structure of objects and scenes from one or multiple 2D images.
Stereo images mean that there are two cameras and 2 images required to calculate a point in 3D space. Essentially the pixels from one image are matched with pixels of the second and epipolar geometry is used to calculate that same point in 3D space.
Introduction. 3D reconstruction from multiple images is the creation of three-dimensional models from a set of images. It is the reverse process of obtaining 2D images from 3D scenes.
StereoRectifyUncalibrated calculates simply planar perspective transformation not rectification transformation in object space. It is necessary to convert this planar transformation to object space transformation to extract Q matrice, and i think some of the camera calibration parameters are required for it( like camera intrinsics ). There may have some research topics ongoing with this subject.
You may have add some steps for estimating camera intrinsics, and extracting relative orientation of cameras to make your flow work right. I think camera calibration parameters are vital for extracting proper 3d structure of the scene, if there is no active lighting method is used.
Also bundle block adjustment based solutions are required for refining all estimated values to more accurate values.
the procedure looks OK to me .
as far as I know, regarding Image based 3D modelling, cameras are explicitly calibrated or implicitly calibrated. you don't want to explicitly calibrating the camera. you will make use of those things anyway. matching corresponding point pairs are definitely a heavily used approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With