Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does the SiftDescriptorExtractor from OpenCV convert descriptor values?

I have a question about the last part of the SiftDescriptorExtractor job,

I'm doing the following:

    SiftDescriptorExtractor extractor;
    Mat descriptors_object;
    extractor.compute( img_object, keypoints_object, descriptors_object );

Now I want to check the elements of a descriptors_object Mat object:

std::cout<< descriptors_object.row(1) << std::endl;

output looks like:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 32, 15, 0, 0, 0, 0, 0, 0, 73, 33, 11, 0, 0, 0, 0, 0, 0, 5, 114, 1, 0, 0, 0, 0, 51, 154, 20, 0, 0, 0, 0, 0, 154, 154, 1, 2, 1, 0, 0, 0, 154, 148, 18, 1, 0, 0, 0, 0, 0, 2, 154, 61, 0, 0, 0, 0, 5, 60, 154, 30, 0, 0, 0, 0, 34, 70, 6, 15, 3, 2, 1, 0, 14, 16, 2, 0, 0, 0, 0, 0, 0, 0, 154, 84, 0, 0, 0, 0, 0, 0, 154, 64, 0, 0, 0, 0, 0, 0, 6, 6, 1, 0, 1, 0, 0, 0]

But in Lowe paper it is stated that:

Therefore, we reduce the influence of large gradient magnitudes by thresholding the values in the unit feature vector to each be no larger than 0.2, and then renormalizing to unit length. This means that matching the magnitudes for large gradients is no longer as important, and that the distribution of orientations has greater emphasis. The value of 0.2 was determined experimentally using images containing differing illuminations for the same 3D objects.

So the numbers from the feature vector should be no larger than 0.2 value.

The question is, how these values have been converted in a Mat object?

like image 958
fen1ksss Avatar asked Feb 16 '13 15:02

fen1ksss


1 Answers

So the numbers from the feature vector should be no larger than 0.2 value.

No. The paper says that SIFT descriptors are:

  1. normalized (with L2 norm)
  2. truncated using 0.2 as a threshold (i.e. loop over the normalized values and truncate when appropriate)
  3. normalized again

So in theory any SIFT descriptor component is between [0, 1], even though in practice the effective range observed is smaller (see below).

The question is, how these values have been converted in a Mat object?

They are converted from floating-point values to unsigned char-s.

Here's the related section from OpenCV modules/nonfree/src/sift.cpp calcSIFTDescriptor method:

float nrm2 = 0;
len = d*d*n;
for( k = 0; k < len; k++ )
    nrm2 += dst[k]*dst[k];
float thr = std::sqrt(nrm2)*SIFT_DESCR_MAG_THR;
for( i = 0, nrm2 = 0; i < k; i++ )
{
    float val = std::min(dst[i], thr);
    dst[i] = val;
    nrm2 += val*val;
}
nrm2 = SIFT_INT_DESCR_FCTR/std::max(std::sqrt(nrm2), FLT_EPSILON);
for( k = 0; k < len; k++ )
{
    dst[k] = saturate_cast<uchar>(dst[k]*nrm2);
}

With:

static const float SIFT_INT_DESCR_FCTR = 512.f;

This is because classical SIFT implementations quantize the normalized floating point values into unsigned char integer through a 512 multiplying factor, which is equivalent to consider that any SIFT component varies between [0, 1/2], and thus avoid to loose precision trying to encode the full [0, 1] range.

like image 121
deltheil Avatar answered Sep 21 '22 14:09

deltheil