Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCR and word reviewing

Tags:

ocr

tesseract

I'm using Tesseract for my letter recognition project and currently the recognitions is quite good. The image processing part was done using OpenCv libraries. The letters are hand written.But there are some problems when I used it to recognise the letter "O" and number "0". These letters are used in data areas as the fields that enter names. So names cannot have any numbers with it. And when we are using the the system of the data fields as date of birth it only contains numbers. So I'm willing to give restriction to the recognition system saying that the corresponding data fields have only numbers or the letters.

And also I'm willing to review the recognised letters with the possible words so we can improve the accuracy of the data. I'm willing to use the openCv libraries for this task. But I don't know what are the libraries that help for this task and what are the functionalities of those. So please can some one help me. Thank you.

Regards, Thilanka.

like image 346
Thilanka Avatar asked Mar 07 '10 11:03

Thilanka


1 Answers

I've never used Tesseract. However, in the FAQ it says

How do I recognise only digits?

TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");

Presumably you could use the pattern of the FAQ entry to set it up so it only recognises letters or just digits appropriately.

If you have already tried this, can you give more details of why it doesn't work?

like image 187
Nick Fortescue Avatar answered Oct 20 '22 17:10

Nick Fortescue