Empty string with Tesseract

Tags:

I'm trying to read different cropped images from a big file and I manage to read most of them but there are some of them which return an empty string when I try to read them with tesseract.

String to read with tesseract

The code is just this line:

Click to copy

pytesseract.image_to_string(cv2.imread("img.png"), lang="eng")

Is there anything I can try to be able to read these kind of images?

Thanks in advance

Edit: enter image description here

689

asked Dec 15 '18 20:12

Alberto Carmona

1 Answers

Thresholding the image before passing it to pytesseract increases the accuracy.

Click to copy

import cv2
import numpy as np

# Grayscale image
img = Image.open('num.png').convert('L')
ret,img = cv2.threshold(np.array(img), 125, 255, cv2.THRESH_BINARY)

# Older versions of pytesseract need a pillow image
# Convert back if needed
img = Image.fromarray(img.astype(np.uint8))

print(pytesseract.image_to_string(img))

This printed out

Click to copy

5.78 / C02

Edit: Doing just thresholding on the second image returns 11.1. Another step that can help is to set the page segmentation mode to "Treat the image as a single text line." with the config --psm 7. Doing this on the second image returns 11.1 "202 ', with the quotation marks coming from the partial text at the top. To ignore those, you can also set what characters to search for with a whitelist by the config -c tessedit_char_whitelist=0123456789.%. Everything together:

Click to copy

pytesseract.image_to_string(img, config='--psm 7 -c tessedit_char_whitelist=0123456789.%')

This returns 11.1 202. Clearly pytesseract is having a hard time with that percent symbol, which I'm not sure how to improve on that with image processing or config changes.

173

answered Sep 28 '22 10:09

A Kruger

Related questions
                            
                                Why import class from another file will call __init__ function?
                            
                                FastAI library v1 with Google Colab
                            
                                How to install mpl_finance packages into environment on Anaconda?
                            
                                pip install urllib3 hanging on "Caching due to etag"
                            
                                How do I generate python grpc code from within a setuptools installer (setup.py)?
                            
                                How to compute Shannon entropy of Information from a Pandas Dataframe?
                            
                                How does sys.executable determine the interpreter path?
                            
                                From pathlib parts tuple to string path
                            
                                Adding a new column in the first ordinal position in a pyspark dataframe
                            
                                For loop to print old value and sum of old value
                            
                                ValueError: Found array with 0 sample (s) (shape= (0, 1) while a minimum of 1 is required by MinMaxScaler
                            
                                Distributing jobs evenly across multiple GPUs with `multiprocessing.Pool`
                            
                                Modify field names in serializer in Django Rest Framework
                            
                                Python Optimized Most Cosine Similar Vector
                            
                                Is there a "with conn.cursor() as..." way to work with Sqlite?
                            
                                Finding the size of a DXF file using EZDXF Python
                            
                                Read Python stdin from pipe, without blocking on empty input
                            
                                Making a clustered bar chart, Pandas
                            
                                Why is changing values in a column of a pandas data frame fast in one case and slow in another one?
                            
                                Calculate two maximums at the same time?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Empty string with Tesseract

Tags:

python

opencv

ocr

tesseract

python-tesseract

Alberto Carmona

People also ask

1 Answers

A Kruger

Recent Activity

Donate For Us