I am trying to detect people from a camera's feed using cv2.HOGDescriptor()
and using their default people classifier.
The recognizer kinda works but I honestly am having an issue with understanding what values to assign to winStride
, padding
,scale
and groupThreshold
respectively.
Currently, the camera feed's frame size is 1280 X 720 and I resize it to 400 X 400 then perform detectMultiScale
with parameters
hogParams = {'winStride': (8, 8), 'padding': (32, 32), 'scale': 1.05, 'finalThreshold': 2}
Based off of this answer, I understand what these parameters do and represent.
My question is, is there a way of mapping image size with these values? A mathematical equation? An estimation method? I am not necessarily asking for a concrete or even a method that gives all values, but something better than trial and error or magic numbers.
Most of the references and tutorials pretty much use magic numbers without giving a proposition of how they attained them.
PS: Here's a visual aid in case you're still not sure of my question
There is no silver bullet here. It is unfortunately very handwavy as the optimal solution will vary from input data to input data.
Here is a little extra guidance:
The best way to go about it is to experiment a little. Choose some images/video that you think are representative of your use case, create an end-to-end setup, and play around with a couple of different parameter settings. If persons are not detected think about their sizes in relation to your detector size. Are they bigger than that? Smaller? If they are smaller perhaps increase the scale-factor, or increase the number of levels. If they are bigger, downscale the input image more.
.. 1280 X 720 and I resize it to 400 X 400...
Side note: If you are simply resizing without cropping you will get bad results. Either resize to the same aspect ratio such as 711x400, or crop the initial image to a square before resizing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With