Is it possible to limit the set of characters that tesseract is looking for (e.g. search only for letters a-z)? That would improve my results greatly.

Create a config file (e.g "letters") in tessdata/configs directory - usually <code>/usr/share/tesseract/tessdata/configs</code> or <code>/usr/share/tesseract-ocr/tessdata/configs</code> And add this line to the config file: <pre class="prettyprint"><code>tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz </code></pre> ...or maybe [a-z] works. I don't know. Then call tesseract similar to this: <pre class="prettyprint"><code>tesseract input.tif output nobatch letters </code></pre> That will limit tesseract to recognize only the wanted characters.

Limit characters tesseract is looking for

1 Answers

Create a config file (e.g "letters") in tessdata/configs directory - usually /usr/share/tesseract/tessdata/configs
or
/usr/share/tesseract-ocr/tessdata/configs

And add this line to the config file:

tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz

...or maybe [a-z] works. I don't know. Then call tesseract similar to this:

tesseract input.tif output nobatch letters

That will limit tesseract to recognize only the wanted characters.

135

answered Sep 26 '22 17:09

Blomman

Related questions
                            
                                How do I segment a document using Tesseract then output the resulting bounding boxes and labels
                            
                                Converting YUV->RGB(Image processing)->YUV during onPreviewFrame in android?
                            
                                Android OCR Library [closed]
                            
                                What kind of OCR Java library should I use in Android? [closed]
                            
                                Extracting code from photograph of T-shirt via OCR
                            
                                Detect text area in an image using python and opencv
                            
                                Use pytesseract OCR to recognize text from an image
                            
                                Split text lines in scanned document
                            
                                Getting the bounding box of the recognized words using python-tesseract
                            
                                Pytesseract OCR multiple config options
                            
                                OCR lib for math formulas
                            
                                How to get the word under the cursor in Windows?
                            
                                How to implement and do OCR in a C# project?
                            
                                How can I implement OCR on a website using PHP? [closed]
                            
                                Converting a Vision VNTextObservation to a String
                            
                                What are good algorithms for vehicle license plate detection? [closed]
                            
                                How to make tesseract to recognize only numbers, when they are mixed with letters?
                            
                                best OCR (Optical character recognition) example in android [closed]
                            
                                How to recognize vehicle license / number plate (ANPR) from an image? [closed]
                            
                                How to get Indexing Service and MODI to produce Full-text over OCR?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Limit characters tesseract is looking for

Tags:

ocr

tesseract

Danilo Bargen

People also ask

1 Answers

Blomman

Recent Activity

Donate For Us