I was just wondering how accurate can tesseract be for handwriting recognition if used with capital letters all in their own little boxes in a form.
I know you can train it to recognise your own handwriting somewhat but the problem in my case is I need to use it across multiple handwritings. Can anyone point me in the right direction?
Thanks a lot.
While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.
What you need is something called an optical character recognition (OCR) tool. OCR tools analyze the handwritten or typed text in images and convert it into editable text. Some tools even have spell checkers that give additional help in the case of unrecognizable words.
Connectionist Temporal Classification(CTC) is an algorithm used to deal with tasks like speech recognition, handwriting recognition etc.
In short, you would have to train the Tesseract engine to recognize the handwriting. Take a look at this link:
Tesseract handwriting with dictionary training
This is what the linked post says:
It's possible to train tesseract to recognize handwriting. Here are the instructions: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract
But don't expect very good results. Academics have typically gotten accuracy results topping out about 90%. Here are a couple references for words and numbers. So if your use case can deal with at least 1/10 errors, this might work for you.
Also here is a good academic article written on this subject:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With