Using this tool http://trainyourtesseract.com/ I would like to be able to use new fonts with pytesseract. the tool give me a file called *.traineddata
Right now I'm using this simple script :
try:
import Image
except ImportError:
from PIL import Image
import pytesseract as tes
results = tes.image_to_string(Image.open('./test.jpg'),boxes=True)
file = open('parsing.text','a')
file.write(results)
print(results)
How to I use my traineddata file so I'm able to read new font with the python script ?
thanks !
edit#1 : so I understand that *.traineddata
can be used with Tesseract as a command-line program. so my question still the same, how do I use traineddata with python ?
edit#2 : the answer to my question is here How to access the command line for Tesseract from Python?
Details. Tesseract uses training data to perform OCR. Most systems default to English training data. To improve OCR performance for other languages you can to install the training data from your distribution.
You can use pytesseract to convert images into text. Pytesseract is a Python package that works with tesseract, which is a command-line optical character recognition (OCR) program. It's a super cool package that can read the text contained in pictures.
Below is a sample of pytesseract.image_to_string()
with options.
pytesseract.image_to_string(Image.open("./imagesStackoverflow/xyz-small-gray.png"),
lang="eng",boxes=False,
config="--psm 4 --oem 3
-c tessedit_char_whitelist=-01234567890XYZ:"))
To use your own trained language data, just replace "eng"
in lang="eng"
with you language name(.traineddata)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With