I am trying to build haar cascades for doing OCR of a specific font; one classifier per character.
I can generate tons of training data just by drawing the font onto images. So, the plan is to generate positive training data for each character, and use the examples of other characters as negative training data.
I am wondering how much variation I should put into the training data. Normally I'd just try everything, but I gather these things take days to train (for each character!) so some advice would be good.
So, a few questions:
Thanks!
It's basically a machine learning algorithm that uses a bunch of images of faces and non-faces to train a classifier that can later be used to detect faces in realtime. The algorithm implemented in OpenCV can also be used to detect other things, as long as you have the right classifiers.
Cascade classifier training requires a set of positive samples and a set of negative images. You must provide a set of positive images with regions of interest specified to be used as positive samples. You can use the Image Labeler to label objects of interest with bounding boxes.
An LBP cascade can be trained to perform similarly (or better) than the Haar cascade, but out of the box, the Haar cascade is about 3x slower, and depending on your data, about 1-2% better at accurately detecting the location of a face.
Haar cascade works as a classifier. It classifies positive data points → that are part of our detected object and negative data points → that don't contain our object. Haar cascades are fast and can work well in real-time. Haar cascade is not as accurate as modern object detection techniques are.
Does the training algorithm recognise that I don't care about transparent pixels? Or will it perform better if I superimpose the characters over different backgrounds?
The more "noise" you give your images on the parts of the training data then the more robust it will be, but yes the longer it will take to train. This is however where your negative sampels will come into action. If you have as many negative training samples as possible with as many ranges as possible then you will create more robust detectors. THat being said, if you have a particular use case in mind then I would suggest skewing your training sets slightly to match that, it will be less robust but much better in your application.
Should I include images where each character is shown with different prefixes and suffixes, or should I just treat each character individually?
If you want to detect individual letters, then train individually. If you train it to detect "ABC" and you only want "A" then it is going to start getting mixed messages. Simply train each letter "A", "B" etc and then your detector should be able to pick out each individual letter in larger images.
Should I include images where the character is scaled up and down? I gather the algorithm pretty much ignores size, and scales everything down for efficiency anyway?
I don't believe this is correct. AFAIK the HAAR algorithm cannot scale down a trained image. So if you train all your images on 50x50 letters but the letters in your images are 25x25 then you won't detect them. If you train and detect the other way round however you will get results. Start small, let the algorithm change the size (up) for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With