I'm working in a project. A part of project consist to integrate the HOG people detector of OpenCV with a camera streaming .
Currently It's working the camera and the basic HOG detector (CPP detectMultiScale -> http://docs.opencv.org/modules/gpu/doc/object_detection.html). But don't work very well... The detections are very noising and the algorithm isn't very accuracy...
Why?
My camera image is 640 x 480 pixels.
The snippet code I'm using is:
std::vector<cv::Rect> found, found_filtered;
cv::HOGDescriptor hog;
hog.setSVMDetector(cv::HOGDescriptor::getDefaultPeopleDetector());
hog.detectMultiScale(image, found, 0, cv::Size(8,8), cv::Size(32,32), 1.05, 2);
Why don't work properly? What need for improve the accuracy? Is necessary some image size particular?
PS: Do you know some precise people detection algorithm, faster and developed in cpp ??
However, OpenCV has a built-in method to detect pedestrians. It has a pre-trained HOG(Histogram of Oriented Gradients) + Linear SVM model to detect pedestrians in images and video streams. This algorithm checks directly surrounding pixels of every single pixel.
The value returned by hog.detectMultiScale is just a list. This list represents the number of people in the image. Therefore, to get the total number of people in the image, just take the len(rects) .
The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image.
The size of the default people detector is 64x128, that mean that the people you would want to detect have to be atleast 64x128. For your camera resolution that would mean that a person would have to take up quite some space before getting properly detected.
Depending on your specific situation, you could try your hand at training your own HOG Descriptor, with a smaller size. You could take a look at this answer and the referenced library if you want to train your own HOG Descriptor.
For the Parameters:
win_stride: Given your input image has a size of 640 x 480, and the defaultpeopleDetector has a window size of 64x128, you can fit the HOG Detection window ( the 64x128 window) multiple times in the input image. The winstride tells HOG to move the detection window a certain amount each time. How does this work: Hog places the detection window on the top left of your input image. and moves the detection window each time by the win_stride.
Like this (small win_stride):
or like this (large win_stride)
A smaller winstride should improve accuracy, but decreases preformance, and the other way around
padding Padding adds a certain amount of extra pixels on each side of the input image. That way the detection window is placed a bit outside the input image. It's because of that padding that HOG can detect people who are very close to the edge of the input image.
group_threshold The group_treshold determines a value by when detected parts should be placed in a group. Low value provides no result grouping, a higher value provides result grouping if the amount of treshold has been found inside the detection windows. (in my own experience, I have never needed to change the default value)
I hope this makes a bit of sense for you. I've been working with HOG for the past few weeks, and read alot of papers, but I lost some of the references, so I can't link you the pages where this info comes from, I'm sorry.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With