I've downloaded PyTesser and extracted it.
I was in the pytesser_v0.0.1
folder and tried to run the sample usage code in the python interpreter:
from pytesser import *
print image_file_to_string('fnord.tif')
and the output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pytesser.py", line 44, in image_file_to_string
call_tesseract(filename, scratch_text_name_root)
File "pytesser.py", line 21, in call_tesseract
proc = subprocess.Popen(args)
File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1259, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
NOTE: I'm in Ubuntu 12.10
with Python 2.7.3
can anyone help me understand this error, and what can I do to fix it ?
Unfortunately tesseract does not have a feature to detect language of the text in an image automatically. An alternative solution is provided by another python module called langdetect which can be installed via pip.
Tesseract tests the text lines to determine whether they are fixed pitch. Where it finds fixed pitch text, Tesseract chops the words into characters using the pitch, and disables the chopper and associator on these words for the word recognition step.
Once installed, the training files will be on your C drive, likely in 'C:\Program Files (x86)\Tesseract-OCR'. The folder will be called 'Tesseract-Master'. You will need to unpack the files using a programme like 7-zip.
This isn't as well documented as it could be, but if you are not on Windows you need to install the tesseract
binary for your platform. On Ubuntu and other Debian based Linux distributions, apt-get install tesseract-ocr
. Then you can run:
python pytesser.py
which uses the test files phototest.tif
, fnord.tif
and fonts_test.png
to test the library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With