Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use trained data with pytesseract?

Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata

Right now I'm using this simple script :

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract as tes

results = tes.image_to_string(Image.open('./test.jpg'),boxes=True)
file = open('parsing.text','a')
file.write(results)
print(results)

How to I use my traineddata file so I'm able to read new font with the python script ?

thanks !

edit#1 : so I understand that *.traineddata can be used with Tesseract as a command-line program. so my question still the same, how do I use traineddata with python ?

edit#2 : the answer to my question is here How to access the command line for Tesseract from Python?

like image 951
Simon Breton Avatar asked May 25 '17 14:05

Simon Breton


People also ask

What data is Tesseract trained on?

Details. Tesseract uses training data to perform OCR. Most systems default to English training data. To improve OCR performance for other languages you can to install the training data from your distribution.

Does Pytesseract need Tesseract?

You can use pytesseract to convert images into text. Pytesseract is a Python package that works with tesseract, which is a command-line optical character recognition (OCR) program. It's a super cool package that can read the text contained in pictures.


1 Answers

Below is a sample of pytesseract.image_to_string() with options.

pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),
                                  lang="eng",boxes=False,
                                  config="--psm 4 --oem 3 
                                  -c tessedit_char_whitelist=-01234567890XYZ:"))

To use your own trained language data, just replace "eng" in lang="eng" with you language name(.traineddata).

like image 129
thewaywewere Avatar answered Jan 02 '23 23:01

thewaywewere