Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest HOG Feature Extraction implementation?

Question
What's the fastest open-source HOG extraction code for multicore CPUs?

Motivation
I'm working on a real-time object detection application. Specifically, I've developed a variant of Deformable Parts Model cascades, targeting 30fps object detection. I've reached a point where extracting HOG features is more expensive than the rest of my pipeline, combined. I'm using the [Felzenzwalb, Girshick, et al] parameters for HOG extraction. That is, a multiresolution pyramid of HOG descriptors, and each descriptor has a total of 32 bins for orientation and a few other cues.

Goals
I'd like to do multiscale HOG feature extraction at 60fps (16ms) for 640x480 images on a multicore CPU.

Related Work
I've benchmarked a few off-the-shelf multiscale HOG implementations on a 6-core Intel 3930k CPU. For a 640x480 image, I observe the following performance numbers:

  • HOG in Dubout's FFLD DPM code: 19fps (52ms) -- C++ with OpenMP, but no vectorization
  • HOG in voc-release5 DPM code: 2.4fps (410ms) -- singlethreaded C++, plus a Matlab wrapper

I've also experimented with the OpenCV HOG extraction code. The OpenCV version works, but it seems to be hard-coded for Dalal-Triggs' HOG setup, and OpenCV doesn't seem to allow me to use the same HOG parameters (normalization scheme, binary position features, etc) as [Felzenzwalb, Girshick, et al]. The OpenCV version also doesn't natively support multiscale HOG, though you could do the downsampling yourself and call OpenCV HOG for each scale. I don't remember what the OpenCV HOG performance looked like.

Final Thoughts

  1. The fastest HOG implementation -- FFLD -- seems to leave a lot of performance on the table. I haven't done a GFLOP/s estimate, but I do notice that FFLD's HOG code doesn't use any SSE/AVX vectorization. There isn't that much control flow, so vectorization seems like a cheap speedup opportunity here.
  2. I haven't mentioned GPU HOG implementations here. I've experimented with groundHOG/CUHOG and fasthog. The CUHOG authors claim 20fps (50ms) HOG extraction on an NVIDIA GTX560. But, Intel CPUs are the target platform for my application, and copying a full HOG pyramid from the GPU to CPU is prohibitively expensive.
like image 533
solvingPuzzles Avatar asked Aug 27 '13 20:08

solvingPuzzles


People also ask

What is HOG in feature extraction?

Histogram of Oriented Gradients, also known as HOG, is a feature descriptor like the Canny Edge Detector, SIFT (Scale Invariant and Feature Transform) . It is used in computer vision and image processing for the purpose of object detection.

Is HOG a machine learning algorithm?

HOG descriptors may be used for object recognition by providing them as features to a machine learning algorithm. Dalal and Triggs used HOG descriptors as features in a support vector machine (SVM); however, HOG descriptors are not tied to a specific machine learning algorithm.


1 Answers

Have a look at the following implementation HoG SSE

It does fit your time requirements. It is written in C and uses 128 bit long SIMD instructions.

The code can be also further customized depending on normalization strategy and output type you need.

I would be glad to hear your feedback and be able to improve this code.

like image 155
ivan_a Avatar answered Oct 15 '22 04:10

ivan_a