I am trying to obtain feature vectors for N =~ 1300 images in my data set, one of the features I have to implement is shape. So I plan to use SIFT descriptors. However, each image returns different number of keypoints, so I run
[F,D] = vl_sift(image);
F is of size 4 x N
and D is of size 128 x N
where N is the number of keypoints detected.
However, I want to obtain a single vector of size 128 x 1
that can represent an image as good as possible. I have seen things like clustering and k-means, but I don't have any idea how to do them.
The most basic idea is to get the average of these N vectors of size 128x1, then I have a feature vector. But is taking the average meaningful? Should I do some kind of histogram?
Any help will be appreciated. Thanks !
Typically the features are quantized using k-means clustering. First, you decide what your "vocabulary size" should be (say 200 "visual words"), and then you run k-means clustering for that number of clusters (200). The SIFT descriptors are vectors of 128 elements, i. e. points in 128-dimensional space.
SIFT is invariance to image scale and rotation. This algorithm is patented, so this algorithm is included in the Non-free module in OpenCV. Locality: features are local, so robust to occlusion and clutter (no prior segmentation) Distinctiveness: individual features can be matched to a large database of objects
So, in 2004, D.Lowe, University of British Columbia, came up with a new algorithm, Scale Invariant Feature Transform (SIFT) in his paper, Distinctive Image Features from Scale-Invariant Keypoints, which extract keypoints and compute its descriptors. * (This paper is easy to understand and considered to be best material available on SIFT.
Introduction to BRIEF (Binary Robust Independent Elementary Features) SIFT is quite an involved algorithm. There are mainly four steps involved in the SIFT algorithm. We will see them one-by-one. Scale-space peak selection: Potential location for finding features. Keypoint Localization: Accurately locating the feature keypoints.
This is actually a big research problem. You are correct, averaging all the descriptors will not be meaningful. There are several approaches out there for creating a single vector out of a set of local descriptors. One big class of methods is called "bag of features" or "bag of visual words". The general idea is to cluster local descriptors (e. g. sift) from many images (e. g. using k-means). Then you take a particular image, figure out which cluster each descriptor from that image belongs to, and create a histogram. There are different ways of doing the clustering and different ways of creating and normalizing the histogram.
A somewhat different approach is called "Pyramid Match Kernel", which is a way of training an SVM classifier on sets of local descriptors.
So for starters google "bag of features" or "bag of visual words".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With