Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SIFT matches and recognition?

I am developing an application where I am using SIFT + RANSAC and Homography to find an object (OpenCV C++,Java). The problem I am facing is that where there are many outliers RANSAC performs poorly.

For this reasons I would like to try what the author of SIFT said to be pretty good: voting.

I have read that we should vote in a 4 dimension feature space, where the 4 dimensions are:

  • Location [x, y] (someone says Traslation)
  • Scale
  • Orientation

While with opencv is easy to get the match scale and orientation with:

cv::Keypoints.octave
cv::Keypoints.angle

I am having hard time to understand how I can calculate the location.

I have found an interesting slide where with only one match we are able to draw a bounding box:

But I don't get how I could draw that bounding box with just one match. Any help?

like image 694
dynamic Avatar asked Apr 11 '13 00:04

dynamic


People also ask

What is SIFT feature matching?

SIFT helps locate the local features in an image, commonly known as the 'keypoints' of the image. These keypoints are scale & rotation invariant that can be used for various computer vision applications, like image matching, object detection, scene detection, etc.

What is SIFT algorithm used for?

Scale-Invariant Feature Transform (SIFT)—SIFT is an algorithm in computer vision to detect and describe local features in images. It is a feature that is widely used in image processing. The processes of SIFT include Difference of Gaussians (DoG) Space Generation, Keypoints Detection, and Feature Description.

Is SIFT better than Orb?

We showed that ORB is the fastest algorithm while SIFT performs the best in the most scenarios. For special case when the angle of rotation is proportional to 90 degrees, ORB and SURF outperforms SIFT and in the noisy images, ORB and SIFT show almost similar performances.

Is SIFT a detector or descriptor?

The Scale-Invariant Feature Transform (SIFT) bundles a feature detector and a feature descriptor. The detector extracts from an image a number of frames (attributed regions) in a way which is consistent with (some) variations of the illumination, viewpoint and other viewing conditions.


2 Answers

You are looking for the largest set of matched features that fit a geometric transformation from image 1 to image 2. In this case, it is the similarity transformation, which has 4 parameters: translation (dx, dy), scale change ds, and rotation d_theta.

Let's say you have matched to features: f1 from image 1 and f2 from image 2. Let (x1,y1) be the location of f1 in image 1, let s1 be its scale, and let theta1 be it's orientation. Similarly you have (x2,y2), s2, and theta2 for f2.

The translation between two features is (dx,dy) = (x2-x1, y2-y1).

The scale change between two features is ds = s2 / s1.

The rotation between two features is d_theta = theta2 - theta1.

So, dx, dy, ds, and d_theta are the dimensions of your Hough space. Each bin corresponds to a similarity transformation.

Once you have performed Hough voting, and found the maximum bin, that bin gives you a transformation from image 1 to image 2. One thing you can do is take the bounding box of image 1 and transform it using that transformation: apply the corresponding translation, rotation and scaling to the corners of the image. Typically, you pack the parameters into a transformation matrix, and use homogeneous coordinates. This will give you the bounding box in image 2 corresponding to the object you've detected.

like image 190
Dima Avatar answered Oct 26 '22 22:10

Dima


When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2) or with the help of central moments).

E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.

votemap(feature.x - a*, feature.y - b*)+=1 where a,b corresponds to this particular feature vector.

If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.

enter image description here

Signature and voting are reverse procedures. Let's assume V=(-20,-10). So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10)) away from the SIFT feature because it is in half size and rotated by -10 degrees.

like image 22
LovaBill Avatar answered Oct 26 '22 20:10

LovaBill