Im taking my first steps in making a haar cascade for custom object recognition. Ive spent time getting a fair bit of data and wrote some preprocessing scripts to convert videos to frames. My next step is to crop the object of interes in order to create some positive training examples. I have a few questions which i genuinely have looked around for answers online - i'm slightly confused:-
I read i should aim to keep the aspect ratio the same - does this mean the same as the original frame OR for all images that i want to use for positive training examples (i.e. frames from completely different videos)
Size - aspect ratio and sizing are obviously not the same. So again do i need to ensure my positive samples are all the same height and width (im pretty sure they should be but thought worth double checking).
Also in terms of size - i have come across some blogs recommending for instance 24 x 24 H x W - what if the object i want to detect is not a square (in my case its a rectangle thats height is around double its width for intance a plastic bottle). Do i leave the size the same or should i convert it to 24 x 24?
Negative samples - should these all be the same aspect ratio and / or size?
I understand this is a probably a very low level/ silly question however it's been far from clear what best practice is here!
I have come across a couple of other answers on here but i dont feel like they offer a satisfactory answer and the field has moved on significantly in the past couple of years
Thanks
The positive samples are generated in a .vec file, which is needed for the training. The createsamples binary will create such a .vec file and automatically scale your defined object regions (defined in a .txt file) to the target format. All your positive sample object regions should have about the same aspect ratio (because the automatic scaling would ruin them otherwise).
The target size should be the mimimun size you want to detect an object (but if too small there wont be thecrelevant features anymore) and its aspect ratio should be the aspect ratio of your object regions.
For example: You have a lot of images with cups. The image resolutions varies, but the aspect ratio of each cup (only the cup region within the image, not all the background) is a out 1:2 (width:height). So you either crop all the images to only hold the cup and minimal background and write the whole cropped image to the txt file and post the full size roi of the cropped image there , or you select the ROI of the cup, add the full size image to the txt file and post that roi region there. The you select a target size like 20x40 or 10x20 or whatever 1:2 aspect ratio you think can be trained.
The negative samples should stay as they are, the training will automatically choose and search subimages of those samples. Just make sure that there are no cups (according to the example) in them.
I've had some good results by drawing black boxes over the objects in the positive samples and use the resulting images as negative samples too, to get more negative samples, but that might depend on your special task.
As a more concrete example, I've taken two cup images from wikimedia. 1. 2.
There is 1 cup in the first image and 2 cups in the second image. I've chosen to not use the handle during training and choose an aspect ratio of 0.85 (1:1.176 w:h) Now you can either choose to write the ROIs to the .txt file, like
image1.jpg 1 653 154 1295 1523
image2.jpg 2 1068 406 1551 1824 3036 1159 852 1004
Or you can first crop the images to these:
and then create a txt file like this:
cropped_image1_cup1.jpg 1 0 0 1295 1523
cropped_image2_cup1.jpg 1 0 0 1551 1824
cropped_image2_cup2.jpg 1 0 0 852 1004
Both should create the same .vec file (if the cropping didnt create any artifacts like additional jpeg compression - better use png ;) ).
You could then choose the target size to be 20x24 for example (aspect ratio 1:1.2). It is good to code a script or tool which fixes the aspect ratio in your labeled input images, so it is much easier and more intuitive to not label your objects with perfect aspect ratio, but label them as they are and postprocess by adjsuting the ROIs to fit the aspect ratio (add some additional background at left/right or top/bottom if necessary). Or ignore the aspect ratio difference, if some deformation is ok for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With