I am trying to get deep into stitching. I am using cv::detail.
I am trying to follow this example:
I roughly understand the stitching pipeline.
there is a function matchesGraphAsString() which return a graph. I am wondering how does it even compute this graph. Further, what is the dfination of confidence interval in this case.
The output is in DOT format and a sample graph looks like
graph matches_graph{
"15.jpg" -- "13.jpg"[label="Nm=75, Ni=50, C=1.63934"];
"15.jpg" -- "12.jpg"[label="Nm=47, Ni=28, C=1.26697"];
"15.jpg" -- "14.jpg"[label="Nm=149, Ni=117, C=2.22011"];
"11.jpg" -- "13.jpg"[label="Nm=71, Ni=52, C=1.77474"];
"11.jpg" -- "9.jpg"[label="Nm=46, Ni=37, C=1.69725"];
"11.jpg" -- "10.jpg"[label="Nm=87, Ni=73, C=2.14076"];
"9.jpg" -- "8.jpg"[label="Nm=122, Ni=99, C=2.21973"];
}
What does label, Nm, and Ni mean here? The official document seems to be lacking these details.
Image stitching or photo stitching is the process of combining multiple photographic images with overlapping fields of view to produce a segmented panorama or high-resolution image.
This is a very interesting question indeed. As @hatboyzero pointed out, the meaning of the variables is reasonably straightforward:
Building a panorama is done by finding interest points in all images and computing descriptors for them. These descriptors, like SIFT, SURF and ORB, were developed so that the same parts of an image could be detected. They are just a medium-dimensional vector (64 or 128 dimensions are typical). By computing the L2 or some other distance between two descriptors, matches can be found. How many matches in a pair of images are found is described by the term Nm.
Notice that so far, the matching has only been done through appearance of image regions around interest points. Very typically, many of these matches are plain wrong. This can be because the descriptor looks the same (think: repetitive object like window sills on a multi-window building, or leaves on a tree) or because the descriptor is just a bit too uninformative.
The common solution is to add geometric constraints: The image pair was taken from the same position with the same camera, therefore points that are close in one image must be close in the other image, too. More specifically, all the points must have undergone the same transformation. In the panorama case where the camera was rotated around the nodal point of the camera-lens system this transformation must have been a 2D homography.
Ransac is the gold standard algorithm to find the best transformation and all the matches that are consistent with this tranformation. The number of these consistent matches is called Ni. Ransac works by randomly selecting in this case 4 matches (see paper sect 3.1) and fitting a homography to these four matches. Then, count how many matches from all possible matches would agree with this homography. Repeat 500 times (see paper) and at the end take the model that had the most inliers. Then re-compute the model with all inliers. The name of the algorithm comes from RANdom SAmple Consensus: RanSaC.
The question for me was, about this mysterious confidence. I quickly found where it was calculated.
From stitching/sources/matches.cpp:
// These coeffs are from paper M. Brown and D. Lowe. "Automatic Panoramic Image Stitching
// using Invariant Features"
matches_info.confidence = matches_info.num_inliers / (8 + 0.3 * matches_info.matches.size());
// Set zero confidence to remove matches between too close images, as they don't provide
// additional information anyway. The threshold was set experimentally.
matches_info.confidence = matches_info.confidence > 3. ? 0. : matches_info.confidence;
The mentioned paper has in section 3.2 ("Probabilistic Model for Image Match Verification") some more details to what this means.
Reading this section a few things stood out.
Though in practice we have chosen values for p0, p1, p(m = 0), p(m = 1) and pmin, they could in principle be learnt from the data.
So, this is just a theoretical exercise as the the parameters have been plucked out of thin air. Notice the could in principle be learnt.
The paper has in equation 13 the confidence calculation. If read correctly, it means that matches_info.confidence indicates a proper match between two images iff its value is above 1.
I don't see any justification in the removal of a match (setting confidence to 0) when the confidence is above 3. It just means that there are very little outliers. I think the programmers thought that a high number of matches that turn out to be outlier means that the images overlap a great deal, but this isn't provided by algorithms behind this. (Simply, the matchings are based on appearance of features.)
Glancing at the OpenCV source code available online, I gather that they mean the following:
I'm basing my assumptions on a snippet from the body of matchesGraphAsString in modules/stitching/src/motion_estimators.cpp from version 2.4.2 of the OpenCV source code. I.e.
        str << "\"" << name_src << "\" -- \"" << name_dst << "\""
            << "[label=\"Nm=" << pairwise_matches[pos].matches.size()
            << ", Ni=" << pairwise_matches[pos].num_inliers
            << ", C=" << pairwise_matches[pos].confidence << "\"];\n";
Additionally, I'm also looking at the documentation for detail::MatchesInfo for information about the Ni and C terms.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With