After that move the traineddata file in your tessdata folder. To use tesseract with the new font in Python or any other language (I think?) put lang = "Font" as second parameter in image_to_string function. It improves accuracy significantly but can still make mistakes ofcourse.
jTessBoxEditor is a box editor and trainer for Tesseract OCR. It provides box data editing for both Tesseract 2.0x and 3.0x formats, and full automation of Tesseract training. It can read images of common image formats, including multi-page TIFF. The program requires Java Runtime Environment 7 or later.
tesstrain.sh needs certain files to use in the training process. These are normally stored in a 'langdata' directory. The langdata for the languages that are officially supported by Tesseract are all stored in the langdata repository, but you can of course store langdata wherever you want.
I'm trying to train Tesseract for a new font which can be used in my Android app. I need to train for digits only, so I had created one training image, box file and unicharset file.
I have followed the training instructions, but when I tried to run tesseract it says, bad read of inttemp!
.
What am I doing wrong? How can I diagnose this error?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With