Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most accurate open-source OCR for handwritten numbers? [closed]

My software needs to read a fixed-length handwritten number.

While I could use a general-purpose library like Tesseract, I am sure there is something smarter. Tesseract will probably misinterpret some of the 1 or 7 as I or l, whereas a software that expects only numbers would not.

Knowing that there are only numbers (American-English way of writing them), the algorithm could focus on 10 potential matches instead of hundreds of symbols.

Any experience OCRing handwritten number-only fields?
What open source library/software did you get the best results with?

like image 525
Nicolas Raoul Avatar asked Apr 01 '10 07:04

Nicolas Raoul


1 Answers

From the FAQ of Tesseract:

How do I recognize only digits?

In 2.03 and above:

Use

TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");

before calling an Init function or put this in a text file called tessdata/configs/digits:

tessedit_char_whitelist 0123456789

and then your command line becomes:

tesseract image.tif outputbase nobatch digits

Warning: Until the old and new config variables get merged, you must have the nobatch parameter too.

But I think since it was designed for printed—not handwritten—text, accuracy might suffer even for digits only.

like image 185
Joey Avatar answered Sep 20 '22 14:09

Joey