I am getting the following error when trying to print a simple test image to text.
I've verified that I have Pillow (PIL 1.1.7) and tried uninstalling and reinstalling pytesseract. The file paths are correct because if I change them I get another error saying that the file cannot be found.
My code:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd= r'C:\Users\bbrown2\AppData\Local\
Programs\Python\Python37\Scripts\pytesseract'
img = r'C:\Users\bbrown2\Desktop\test.png'
print(pytesseract.image_to_string(Image.open(img)))
I expect it to print out the words in the image but instead I always get this:
Traceback (most recent call last):
File
"c:\Users\bbrown2\Desktop\PythonMaterials\python_test_tesseract.py", line
14, in <module>
print(pytesseract.image_to_string(Image.open(image)))
File "C:\Users\bbrown2\AppData\Local\Programs\Python\Python37\lib\site-
packages\pytesseract\pytesseract.py", line 309, in image_to_string
}[output_type]()
File "C:\Users\bbrown2\AppData\Local\Programs\Python\Python37\lib\site-
packages\pytesseract\pytesseract.py", line 308, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\Users\bbrown2\AppData\Local\Programs\Python\Python37\lib\site-
packages\pytesseract\pytesseract.py", line 218, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\bbrown2\AppData\Local\Programs\Python\Python37\lib\site-
packages\pytesseract\pytesseract.py", line 194, in run_tesseract
raise TesseractError(status_code, get_errors(error_string))
pytesseract.pytesseract.TesseractError: (2, 'Usage: python pytesseract.py
[-l lang] input_file')
Pytesseract or Python-tesseract is an OCR tool for python that also serves as a wrapper for the Tesseract-OCR Engine. It can read and recognize text in images and is commonly used in python ocr image to text use cases.
Point pytesseract at your tesseract installation Create a Python script (a . py-file), or start up a Jupyter notebook. At the top of the file, import pytesseract , then point pytesseract at the tesseract installation you discovered in the previous step.
On Windows 64 bits, just add the following to the PATH environment variable: "C:\Program Files\Tesseract-OCR" and it will work.
The problem is pytesseract is just a nice Python wrapper for the command line program Tesseract.
You're supposed to point tesseract_cmd
at the actual Tesseract binary, not the pytesseract CLI util.
So, you'll need to install Tesseract. Windows builds are available. I chose the version 3.05 installer, and it installed by default to C:\Program Files (x86)\Tesseract-OCR\tesseract
. Then, I ran the following and it worked fine:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = (
r"C:\Program Files (x86)\Tesseract-OCR\tesseract"
)
img = r"C:\Users\cody\Desktop\ocrtest.png"
print(pytesseract.image_to_string(Image.open(img)))
Test input:
Result:
The (quick) [brown] {fox} jumps!
Over the $43,456.78 <lazy> #90 dog
& duck/goose, as 12.5% of E-mail
from [email protected] is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra i] cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom répida
salta sobre 0 C50 preguicoso.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With