I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly. For this reasons I would like to try what the author of SIFT said to be pretty good: voting. I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are: <ul> <li>Location [x, y] (someone says Traslation)</li> <li>Scale</li> <li>Orientation</li> </ul> While with opencv is easy to get the match <code>scale</code> and <code>orientation</code> with: <pre class="prettyprint"><code>cv::Keypoints.octave cv::Keypoints.angle </code></pre> I am having hard time to understand how I can calculate the location. I have found an interesting slide where with only <code>one match</code> we are able to draw a bounding box: <img src="https://i.stack.imgur.com/ejPiP.png" width="400"> But I don't get how I could draw that bounding box with just one match. Any help?

You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation <code>(dx, dy)</code>, scale change <code>ds</code>, and rotation <code>d_theta</code>. Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let <code>(x1,y1)</code> be the location of f1 in image 1, let <code>s1</code> be its scale, and let <code>theta1</code> be it's orientation. Similarly you have <code>(x2,y2)</code>, <code>s2</code>, and <code>theta2</code> for f2. The translation between two features is <code>(dx,dy) = (x2-x1, y2-y1)</code>. The scale change between two features is <code>ds = s2 / s1</code>. The rotation between two features is <code>d_theta = theta2 - theta1</code>. So, <code>dx</code>, <code>dy</code>, <code>ds</code>, and <code>d_theta</code> are the dimensions of your Hough space. Each bin corresponds to a similarity transformation. Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.

When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either <code>(w/2,h/2)</code> or with the help of central moments). E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a <code>vector<{a,b}></code>. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid. <code>votemap(feature.x - a*, feature.y - b*)+=1</code> where a,b corresponds to this particular feature vector. If some of those features cast successfully at the same point (clustering is essential), you have found an object instance. <img src="https://i.stack.imgur.com/K1QJZ.png" alt="enter image description here"> Signature and voting are reverse procedures. Let's assume <code>V=(-20,-10)</code>. So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be <code>V'=(+20*0.5*cos(-10),+10*0.5*sin(-10))</code> away from the SIFT feature because it is in half size and rotated by -10 degrees.

SIFT matches and recognition?

Tags:

opencv

computer-vision

object-detection

sift

I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.

For this reasons I would like to try what the author of SIFT said to be pretty good: voting.

I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:

Location [x, y] (someone says Traslation)
Scale
Orientation

While with opencv is easy to get the match scale and orientation with:

cv::Keypoints.octave
cv::Keypoints.angle

I am having hard time to understand how I can calculate the location.

I have found an interesting slide where with only one match we are able to draw a bounding box:

But I don't get how I could draw that bounding box with just one match. Any help?

694

asked Apr 11 '13 00:04

dynamic

2 Answers

You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.

Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.

The translation between two features is (dx,dy) = (x2-x1, y2-y1).

The scale change between two features is ds = s2 / s1.

The rotation between two features is d_theta = theta2 - theta1.

So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.

Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.

190

answered Oct 26 '22 22:10

Dima

When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2) or with the help of central moments).

E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.

votemap(feature.x - a*, feature.y - b*)+=1 where a,b corresponds to this particular feature vector.

If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.

enter image description here

Signature and voting are reverse procedures. Let's assume V=(-20,-10). So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10)) away from the SIFT feature because it is in half size and rotated by -10 degrees.

answered Oct 26 '22 20:10

LovaBill

Related questions
                            
                                Python/OpenCV — Centroid Determination in Bacterial Clusters
                            
                                Kinect and Opencv, the depth image, how to use it
                            
                                Stereo vision with OpenCV
                            
                                Using a mask with an adaptive threshold?
                            
                                Mat to unsigned char*
                            
                                Cannot build Opencv project with cmake
                            
                                OpenCV cv2 image to PyGame image?
                            
                                iOS - Building a static framework with other framework dependencies
                            
                                Using custom camera in OpenCV (via GStreamer)
                            
                                WebRTC with python
                            
                                Programmatically find shaky OR out-of-focus Images
                            
                                compress output tiff with g4 compression
                            
                                Difference between cv::Mat::t () and cv::transpose()
                            
                                Highly inconsistent OCR result for tesseract
                            
                                Compute coordinates from source images after stitching
                            
                                Color calibration with color checker using using Root-Polynomial Regression not giving correct results
                            
                                Rectangle detection inaccuracy using approxPolyDP() in openCV
                            
                                Iris detection with opencv
                            
                                Using drawContours OpenCV function in python
                            
                                Computing image integral

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With