How to use trained data with pytesseract?

Tags:

Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata

Right now I'm using this simple script :

try:
    import Image
except ImportError:
    from PIL import Image
import pytesseract as tes

results = tes.image_to_string(Image.open('./test.jpg'),boxes=True)
file = open('parsing.text','a')
file.write(results)
print(results)

How to I use my traineddata file so I'm able to read new font with the python script ?

thanks !

edit#1 : so I understand that *.traineddata can be used with Tesseract as a command-line program. so my question still the same, how do I use traineddata with python ?

edit#2 : the answer to my question is here How to access the command line for Tesseract from Python?

951

asked May 25 '17 14:05

Simon Breton

1 Answers

Below is a sample of pytesseract.image_to_string() with options.

pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),
                                  lang="eng",boxes=False,
                                  config="--psm 4 --oem 3 
                                  -c tessedit_char_whitelist=-01234567890XYZ:"))

To use your own trained language data, just replace "eng" in lang="eng" with you language name(.traineddata).

129

answered Jan 02 '23 23:01

thewaywewere

Related questions
                            
                                Extracting particular text associated value from an image
                            
                                What is the best algorithm to locate a point in an image file?
                            
                                improve Tesseract performance with OpenCV on Android
                            
                                OCR (tesseract), intelligent rotation for Image
                            
                                How to reduce the size of the PDF generated by tesseract?
                            
                                Unable to resolve dependencies for the Python OCR Library pypdfocr [duplicate]
                            
                                java.lang.UnsatisfiedLinkError: The specified module could not be found
                            
                                Android NDK - building TessTwo (Fork of Tesseract Tools for Android) - ndk-build fails
                            
                                Changing image DPI for usage with tesseract
                            
                                Extracting data from Invoices in pdf or image format
                            
                                Most accurate open-source OCR for handwritten numbers? [closed]
                            
                                Using Google docs API for OCR in android
                            
                                What is the ideal image for tesseract library?
                            
                                Connected Character segmentation in OpenCV
                            
                                Normalize car plate for OCR in OpenCV C++
                            
                                Is it possible to Recognize the Character drawn on iPhone screen using Tesseract OCR?
                            
                                How to read pdf file to a text file in a proper format using Spire.PDF or any other library?
                            
                                The semantics of TessBaseAPI::Clear()
                            
                                Tesseract always missing a text line in picture
                            
                                FOSS Intelligent Character Recognition (ICR) [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use trained data with pytesseract?

Tags:

ocr

tesseract

python-tesseract

Simon Breton

People also ask

1 Answers

thewaywewere

Recent Activity

Donate For Us