I get often confused with the meaning of the term descriptor in the context of image features. Is a descriptor the description of the local neighborhood of a point (e.g. a float vector), or is a descriptor the algorithm that outputs the description? Also, what exactly is then the output of a feature-extractor?
I have been asking myself this question for a long time, and the only explanation I came up with is that a descriptor is both, the algorithm and the description. A feature detector is used to detect distinctive points. A feature-extractor, however, does then not seem to make any sense.
So, is a feature descriptor the description or the algorithm that produces the description?
It is a simplified representation of the image that contains only the most important information about the image. There are a number of feature descriptors out there. Here are a few of the most popular ones: HOG: Histogram of Oriented Gradients. SIFT: Scale Invariant Feature Transform.
Feature detection is a method to compute abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not. Feature detection is a low-level image processing operation.
Feature extraction means computing a descriptor from the pixels around each interest point. The simplest descriptor is just the raw pixel values in a small patch around the interest point. More sophisticated descriptors include SURF, HOG, and FREAK.
Image Feature Vector: An abstraction of an image used to characterize and numerically quantify the contents of an image. Normally real, integer, or binary valued. Simply put, a feature vector is a list of numbers used to represent an image.
A feature detector is an algorithm which takes an image and outputs locations (i.e. pixel coordinates) of significant areas in your image. An example of this is a corner detector, which outputs the locations of corners in your image but does not tell you any other information about the features detected.
A feature descriptor is an algorithm which takes an image and outputs feature descriptors/feature vectors. Feature descriptors encode interesting information into a series of numbers and act as a sort of numerical "fingerprint" that can be used to differentiate one feature from another. Ideally this information would be invariant under image transformation, so we can find the feature again even if the image is transformed in some way. An example would be SIFT, which encodes information about the local neighbourhood image gradients the numbers of the feature vector. Other examples you can read about are HOG and SURF.
EDIT: When it comes to feature detectors, the "location" might also include a number describing the size or scale of the feature. This is because things that look like corners when "zoomed in" may not look like corners when "zoomed out", and so specifying scale information is important. So instead of just using an (x,y)
pair as a location in "image space", you might have a triple (x,y,scale)
as location in "scale space".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With