Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tesseract OCR user patterns

Tags:

Is there any way to get Tesseract to match only user-specified words or patterns? The manual claims it is possible, yet I cannot find a single documented instance on the internet of somebody getting this working.

Here are many examples of people asking for help because it does not work, and none have a proven resolution.

stackoverflow.com/questions/33429143/tesseract-user-pattern-is-not-applied

stackoverflow.com/questions/31874393/tesseract-ocr-force-pattern

stackoverflow.com/questions/26856349/provide-pattern-for-tesseract

stackoverflow.com/questions/22432194/tesseract-ocr-only-detect-user-words

stackoverflow.com/questions/17209919/tesseract-user-patterns

groups.google.com/forum/#!topic/tesseract-ocr/S9CIK3jOMWw

groups.google.com/forum/#!topic/tesseract-ocr/5vFqVcJmHnM

So can we conclude that this feature simply does not work? Is there an official statement to this effect?

like image 439
Michael Connor Avatar asked Jan 01 '16 22:01

Michael Connor


People also ask

What is a Tesseract pattern?

© Cubes within cubes; the tesseract represents the boundaries between reality and conceptuality. This MESH pattern embodies the principles of geometric repetition to create a landscape of predictability and strength, drawing your eye ever further into the fourth dimension.

How does a Tesseract OCR engine work?

Tesseract tests the text lines to determine whether they are fixed pitch. Where it finds fixed pitch text, Tesseract chops the words into characters using the pitch, and disables the chopper and associator on these words for the word recognition step.

What is OEM in Tesseract?

Engine Mode ( --oem ). Tesseract has several engine modes with different performance and speed. Tesseract 4 have introduced additional LSTM neural net mode, which often works best.


1 Answers

There is now an example on the Tesseract doc site at https://tesseract-ocr.github.io/tessdoc/APIExample-user_patterns.html [Thanks @Ravi for the new link]

That test example does work for me in the oem=1 / LSTM mode of Tesseract 4.x.

I can't, however, get it to work for any other examples, or in any other modes.

I have seen no official statement and at the time of writing it does indeed seem that the feature is non-functional.

like image 58
jtlz2 Avatar answered Sep 20 '22 19:09

jtlz2