I'm working with OpenCV's CPU version of Histogram of Oriented Gradients (HOG). I'm using a 32x32 image with 4x4 cells, 4x4 blocks, no overlap among blocks, and 15 orientation bins. OpenCV's HOGDescriptor
gives me a 1D feature vector of length 960. This makes sense, because (32*32 pixels) * (15 orientations) / (4*4 cells) = 960.
However, I'm not sure about how these 960 numbers are laid out in memory. My guess would be that it's like this:
vector<float> descriptorsValues =
[15 bins for cell 0, 0]
[15 bins for cell 0, 1]
...
[15 bins for cell 0, 7]
....
[15 bins for cell 7, 0]
[15 bins for cell 7, 1]
...
[15 bins for cell 7, 7]
Of course, this is a 2D problem flattened into 1D, so it would actually look like this:
[cell 0, 0] [cell 0, 1] ... [cell 7, 0] ... [cell 7, 7]
Here's my example code for this:
using namespace cv;
//32x32 image, 4x4 blocks, 4x4 cells, 4x4 blockStride
vector<float> hogExample(cv::Mat img)
{
img = img.rowRange(0, 32).colRange(0,32); //trim image to 32x32
bool gamma_corr = true;
cv::Size win_size(img.rows, img.cols); //using just one window
int c = 4;
cv::Size block_size(c,c);
cv::Size block_stride(c,c); //no overlapping blocks
cv::Size cell_size(c,c);
int nOri = 15; //number of orientation bins
cv::HOGDescriptor d(win_size, block_size, block_stride, cell_size, nOri, 1, -1,
cv::HOGDescriptor::L2Hys, 0.2, gamma_corr, cv::HOGDescriptor::DEFAULT_NLEVELS);
vector<float> descriptorsValues;
vector<cv::Point> locations;
d.compute(img, descriptorsValues, cv::Size(0,0), cv::Size(0,0), locations);
printf("descriptorsValues.size() = %d \n", descriptorsValues.size()); //prints 960
return descriptorsValues;
}
Related resources: This StackOverflow post and this tutorial helped me to get started with the OpenCV HOGDescriptor.
I believe you got the right idea.
In its original paper Histograms of Oriented Gradients for Human Detection (Page 2), it says
[...] The detector window is tiled with a grid of overlapping blocks in which Histogram of Oriented Gradient feature vectors are extracted. [...]
[...] Tiling the detection window with a dense (in fact, overlapping) grid of HOG descriptors and using the combined feature vector [...]
All it talked about is tiling them together. Although no detail info is introduced on how to exactly tile them together. I guess there should be no fancy things happens here (otherwise they will talk about it), i.e. just regularly concatenating them (from left to right, top to down).
After all, It's reasonable and the easiest way to layout the data.
Edit: You will convince yourself more if you look at how people access and visualize the data.
for (int blockx=0; blockx<blocks_in_x_dir; blockx++)
{
for (int blocky=0; blocky<blocks_in_y_dir; blocky++)
{
for (int cellNr=0; cellNr<4; cellNr++)
{
for (int bin=0; bin<gradientBinSize; bin++)
{
float gradientStrength = descriptorValues[ descriptorDataIdx ];
descriptorDataIdx++;
// ... ...
} // for (all bins)
} // for (all cells)
} // for (all block x pos)
} // for (all block y pos)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With