Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HOG: What is done in the contrast-normalization step?

According to the HOG process, as described in the paper Histogram of Oriented Gradients for Human Detection (see link below), the contrast normalization step is done after the binning and the weighted vote.

I don't understand something - If I already computed the cells' weighted gradients, how can the normalization of the image's contrast help me now?

As far as I understand, contrast normalization is done on the original image, whereas for computing the gradients, I already computed the X,Y derivatives of the ORIGINAL image. So, if I normalize the contrast and I want it to take effect, I should compute everything again.

Is there something I don't understand well?

Should I normalize the cells' values?

Is the normalization in HOG not about contrast anyway, but is about the histogram values (counts of cells in each bin)?

Link to the paper: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf

like image 438
SomethingSomething Avatar asked Dec 08 '22 01:12

SomethingSomething


1 Answers

The contrast normalization is achieved by normalization of each block's local histogram.

The whole HOG extraction process is well explained here: http://www.geocities.ws/talh_davidc/#cst_extract

When you normalize the block histogram, you actually normalize the contrast in this block, if your histogram really contains the sum of magnitudes for each direction.

The term "histogram" is confusing here, because you do not count how many pixels has direction k, but instead you sum the magnitudes of such pixels. Thus you can normalize the contrast after computing the block's vector, or even after you computed the whole vector, assuming that you know in which indices in the vector a block starts and a block ends.

The steps of the algorithm due to my understanding - worked for me with 95% success rate:

  1. Define the following parameters (In this example, the parameters are like HOG for Human Detection paper):

    • A cell size in pixels (e.g. 6x6)
    • A block size in cells (e.g. 3x3 ==> Means that in pixels it is 18x18)
    • Block overlapping rate (e.g. 50% ==> Means that both block width and block height in pixels have to be even. It is satisfied in this example, because the cell width and cell height are even (6 pixels), making the block width and height also even)
    • Detection window size. The size must be dividable by a half of the block size without remainder (so it is possible to exactly place the blocks within with 50% overlapping). For example, the block width is 18 pixels, so the windows width must be a multiplication of 9 (e.g. 9, 18, 27, 36, ...). Same for the window height. In our example, the window width is 63 pixels, and the window height is 126 pixels.
  2. Calculate gradient:

    • Compute the X difference using convolution with the vector [-1 0 1]
    • Compute the Y difference using convolution with the transpose of the above vector
    • Compute the gradient magnitude in each pixel using sqrt(diffX^2 + diffY^2)
    • Compute the gradient direction in each pixel using atan(diffY / diffX). Note that atan will return values between -90 and 90, while you will probably want the values between 0 and 180. So just flip all the negative values by adding to them +180 degrees. Note that in HOG for Human Detection, they use unsigned directions (between 0 and 180). If you want to use signed directions, you should make a little more effort: If diffX and diffY are positive, your atan value will be between 0 and 90 - leave it as is. If diffX and diffY are negative, again, you'll get the same range of possible values - here, add +180, so the direction is flipped to the other side. If diffX is positive and diffY is negative, you'll get values between -90 and 0 - leave them the same (You can add +360 if you want it positive). If diffY is positive and diffX is negative, you'll again get the same range, so add +180, to flip the direction to the other side.
    • "Bin" the directions. For example, 9 unsigned bins: 0-20, 20-40, ..., 160-180. You can easily achieve that by dividing each value by 20 and flooring the result. Your new binned directions will be between 0 and 8.
  3. Do for each block separately, using copies of the original matrix (because some blocks are overlapping and we do not want to destroy their data):

    • Split to cells
    • For each cell, create a vector with 9 members (one for each bin). For each index in the bin, set the sum of all the magnitudes of all the pixels with that direction. We have totally 6x6 pixels in a cell. So for example, if 2 pixels have direction 0 while the magnitude of the first one is 0.231 and the magnitude of the second one is 0.13, you should write in index 0 in your vector the value 0.361 (= 0.231 + 0.13).
    • Concatenate all the vectors of all the cells in the block into a large vector. This vector size should of course be NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK. In our example, it is 9 * (3 * 3) = 81.
    • Now, normalize this vector. Use k = sqrt(v[0]^2 + v[1]^2 + ... + v[n]^2 + eps^2) (I used eps = 1). After you computed k, divide each value in the vector by k - thus your vector will be normalized.
  4. Create final vector:

    • Concatenate all the vectors of all the blocks into 1 large vector. In my example, the size of this vector was 6318
like image 55
SomethingSomething Avatar answered Jun 03 '23 04:06

SomethingSomething