Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract handwriting with dictionary training

I have a dictionary of words in a text file, separated by newlines. And I want to recognize the handwriting using Tesseract, and output the nearest matching line in the text file.

This is the first time I'll be using Tesseract, and it's already in my project workspace, I just need the training data.

Is it possible to train Tesseract to do this?

like image 233
Ruel Avatar asked Sep 07 '12 00:09

Ruel


People also ask

Does Tesseract work with handwriting?

In the current work, Tesseract 2.01 is used for developing user-specific handwriting recognition models, viz., the language sets, for the iJIT system. To generate the language sets for each user, Tesseract is trained with labeled handwritten data samples of isolated and free-flow texts of lower case Roman script.


1 Answers

It's possible to train tesseract to recognize handwriting. Here are the instructions: https://tesseract-ocr.github.io/tessdoc/Training-Tesseract

But don't expect very good results. Academics have typically gotten accuracy results topping out about 90%. Here are a couple references for words and numbers. So if your use case can deal with at least 1/10 errors, this might work for you.

like image 153
Leopd Avatar answered Oct 13 '22 23:10

Leopd