I am using python 3.x and using the following code to convert image into text:
from PIL import Image
from pytesseract import image_to_string
image = Image.open('image.png', mode='r')
print(image_to_string(image))
I am getting the following error:
Traceback (most recent call last):
File "C:/Users/hp/Desktop/GII/Image_to_text.py", line 12, in <module>
print(image_to_string(image))
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 161, in image_to_string
config=config)
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\site-packages\pytesseract\pytesseract.py", line 94, in run_tesseract
stderr=subprocess.PIPE)
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 950, in __init__
restore_signals, start_new_session)
File "C:\Users\hp\Downloads\WinPython-64bit-3.5.1.2\python-3.5.1.amd64\lib\subprocess.py", line 1220, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] The system cannot find the file specified
Please note that I have put the image in the same directory where my python is present. Also It does not raise error on image = Image.open('image.png', mode='r')
but it raises on the line print(image_to_string(image))
.
Any idea what might be wrong here? Thanks
There are programs that use Optical Character Recognition (OCR) to analyze the letters and words in an image and then convert them to text. There are a number of reasons why you might want to use OCR technology to copy text from an image or PDF.
You have to have tesseract
installed and accesible in your path.
According to source, pytesseract
is merely a wrapper for subprocess.Popen
with tesseract binary as a binary to run. It does not perform any kind of OCR itself.
Relevant part of sources:
def run_tesseract(input_filename, output_filename_base, lang=None, boxes=False, config=None):
'''
runs the command:
`tesseract_cmd` `input_filename` `output_filename_base`
returns the exit status of tesseract, as well as tesseract's stderr output
'''
command = [tesseract_cmd, input_filename, output_filename_base]
if lang is not None:
command += ['-l', lang]
if boxes:
command += ['batch.nochop', 'makebox']
if config:
command += shlex.split(config)
proc = subprocess.Popen(command,
stderr=subprocess.PIPE)
return (proc.wait(), proc.stderr.read())
Quoting another part of source:
# CHANGE THIS IF TESSERACT IS NOT IN YOUR PATH, OR IS NAMED DIFFERENTLY
tesseract_cmd = 'tesseract'
So quick way of changing tesseract path would be:
import pytesseract
pytesseract.tesseract_cmd = "/absolute/path/to/tesseract" # this should be done only once
pytesseract.image_to_string(img)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With