Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCV HOG Descriptor Parameters

I am trying to detect people from a camera's feed using cv2.HOGDescriptor() and using their default people classifier.

The recognizer kinda works but I honestly am having an issue with understanding what values to assign to winStride, padding,scale and groupThreshold respectively.

Currently, the camera feed's frame size is 1280 X 720 and I resize it to 400 X 400 then perform detectMultiScale with parameters

hogParams = {'winStride': (8, 8), 'padding': (32, 32), 'scale': 1.05, 'finalThreshold': 2}

Based off of this answer, I understand what these parameters do and represent.

My question is, is there a way of mapping image size with these values? A mathematical equation? An estimation method? I am not necessarily asking for a concrete or even a method that gives all values, but something better than trial and error or magic numbers.

Most of the references and tutorials pretty much use magic numbers without giving a proposition of how they attained them.

PS: Here's a visual aid in case you're still not sure of my question I am looking for the cloud

like image 987
eshirima Avatar asked Oct 17 '22 22:10

eshirima


1 Answers

There is no silver bullet here. It is unfortunately very handwavy as the optimal solution will vary from input data to input data.

Here is a little extra guidance:

  • If stride > window size your detector might not even be run on the person. I always think of stride in relation to the window size e.g. 64/8.
  • If scale~1 not much will happen. Values like 1.2, 1.3 are usually better. This parameter essentially scales the image down, and then runs the detector again. The hope is, that if people were too big for the detector in the first run, they might be the right size after scaling down. E.g. if your detector size is the default 64x128, but some person in the image is 150px high, the detector might not realize it's a person as it can only view the legs, or the torso at once. If we scale down 150 / 1.2 = 125 the person might now actually be detected. (silly numbers. It is very plausible that it could detect the person if they were 150px. But you get the idea.)

The best way to go about it is to experiment a little. Choose some images/video that you think are representative of your use case, create an end-to-end setup, and play around with a couple of different parameter settings. If persons are not detected think about their sizes in relation to your detector size. Are they bigger than that? Smaller? If they are smaller perhaps increase the scale-factor, or increase the number of levels. If they are bigger, downscale the input image more.

.. 1280 X 720 and I resize it to 400 X 400...

Side note: If you are simply resizing without cropping you will get bad results. Either resize to the same aspect ratio such as 711x400, or crop the initial image to a square before resizing.

like image 183
Aske Doerge Avatar answered Oct 21 '22 05:10

Aske Doerge