I am trying to detect these price labels text which is always clearly preprocessed. Although it can easily read the text written above it, it fails to detect price values. I am using python bindings pytesseract although it also fails to read from the CLI commands. Most of the time it tries to recognize the part where the price as one or two characters. Sample 1: <img src="https://i.stack.imgur.com/dKC6k.png"> <pre class="prettyprint"><code>tesseract D:\tesseract\tesseract_test_images\test.png output </code></pre> And the output of the sample image is this. <blockquote> je Beutel 13 </blockquote> However if I crop and stretch the price to look like they are seperated and are the same font size, output is just fine. Processed image(cropped and shrinked price): <img src="https://i.stack.imgur.com/UNeT2.png"> <blockquote> je Beutel 1,89 </blockquote> How do get OCR tesseract to work as I intended, as I will be going over a lot of similar images? Edit: Added more price tags: <img src="https://i.stack.imgur.com/8bPOH.jpg" alt="sample2"><img src="https://i.stack.imgur.com/K6krq.jpg" alt="sample3"><img src="https://i.stack.imgur.com/HTYux.jpg" alt="sample4">sample5 sample6 sample7

The problem is the image you are using is of small size. Now when tesseract processes the image it considers '8', '9' and ',' as a single letter and thus predicts it to '3' or may consider '8' and ',' as one letter and '9' as a different letter and so produces wrong output. The image shown below explains it. <img src="https://i.stack.imgur.com/m5SKl.png" alt="detected contours of original(small) image"> A simple solution could be increasing its size by factor of 2 or 3 or even more as per the size of your original image and then passing to tesseract so that it detects each letter individually as shown below. (Here I increased its size by factor of 2) <img src="https://i.stack.imgur.com/5ayLZ.png" alt="detected contours of resized(larger) image"> Bellow is a simple python script that will solve your purpose <pre class="prettyprint"><code>import pytesseract import cv2 img = cv2.imread('dKC6k.png') img = cv2.resize(img, None, fx=2, fy=2) data = pytesseract.image_to_string(img) print(data) </code></pre> Detected text: <pre class="prettyprint"><code>je Beutel 89 1. </code></pre> Now you can simply extract the required data from the text and format it as per your requirement. <pre class="prettyprint"><code>data = data.replace('\n\n', '\n') data = data.split('\n') dollars = data[2].strip(',').strip('.') cents = data[1] print('{}.{}'.format(dollars, cents)) </code></pre> Desired Format: <pre class="prettyprint"><code>1.89 </code></pre>

Tesseract OCR fails to detect varying font size and letters that are not horizontally aligned

Tags:

python

opencv

ocr

tesseract

I am trying to detect these price labels text which is always clearly preprocessed. Although it can easily read the text written above it, it fails to detect price values. I am using python bindings pytesseract although it also fails to read from the CLI commands. Most of the time it tries to recognize the part where the price as one or two characters.

Sample 1:

tesseract D:\tesseract\tesseract_test_images\test.png output

And the output of the sample image is this.

je Beutel

13

However if I crop and stretch the price to look like they are seperated and are the same font size, output is just fine.

Processed image(cropped and shrinked price):

je Beutel

1,89

How do get OCR tesseract to work as I intended, as I will be going over a lot of similar images? Edit: Added more price tags:
sample2 sample3 sample4 sample5 sample6 sample7

465

asked Mar 28 '18 13:03

NONONONONO

2 Answers

The problem is the image you are using is of small size. Now when tesseract processes the image it considers '8', '9' and ',' as a single letter and thus predicts it to '3' or may consider '8' and ',' as one letter and '9' as a different letter and so produces wrong output. The image shown below explains it.

detected contours of original(small) image

A simple solution could be increasing its size by factor of 2 or 3 or even more as per the size of your original image and then passing to tesseract so that it detects each letter individually as shown below. (Here I increased its size by factor of 2)

detected contours of resized(larger) image

Bellow is a simple python script that will solve your purpose

import pytesseract
import cv2

img = cv2.imread('dKC6k.png')
img = cv2.resize(img, None, fx=2, fy=2)

data = pytesseract.image_to_string(img)
print(data)

Detected text:

je Beutel

89
1.

Now you can simply extract the required data from the text and format it as per your requirement.

data = data.replace('\n\n', '\n')
data = data.split('\n')

dollars = data[2].strip(',').strip('.')
cents = data[1]

print('{}.{}'.format(dollars, cents))

Desired Format:

1.89

177

answered Nov 02 '22 07:11

Shivam K. Thakkar

The problem is that the Tesseract engine was not trained to read this kind of text topology.

You can:

train your own model, and you'll need in particular to provide images with variations of topology (position of characters). You can actually use the same image, and shuffle the positions of the characters.
reorganize the image into clusters of text and use tesseract, in particular, I would consider the cents part and move it on the right of the coma, in that case you can use tesseract out of the box. Few relevant criterions would be the height of the clusters (to differenciate cents and integers), and the position of the clusters (read from the left to the right).

In general computer vision algorithms (including CNNs) are giving you tool to have a higher representation of an image (features or descriptors), but they fail to create a logic or an algorithm to process intermediate results in a certain way.

In your case that would be:

"if the height of those letters are smaller, it's cents",
"if the height, and vertical position is the same, it's about the same number, either on left of coma, or on the right of coma".

The thing is that it's difficult to reach that through training, and at the same time it's extremely simple to write this for a human as an algorithm. Sorry for not giving you an actual implementation, but my text is the pseudo code.

TrainingTesseract2

TrainingTesseract4

Joint Unsupervised Learning of Deep Representations and Image Clusters

answered Nov 02 '22 08:11

Soleil

Related questions
                            
                                Why does numpy.r_ use brackets instead of parentheses?
                            
                                python sqlite insert named parameters or null
                            
                                Creating a tree/deeply nested dict from an indented text file in python
                            
                                How do I crop to largest interior bounding box in OpenCV?
                            
                                Pip doesn't install latest available version from pypi (argparse in this case)
                            
                                Creating same random number sequence in Python, NumPy and R
                            
                                How to get SQLite result/error codes in Python
                            
                                How to solve the 10054 error
                            
                                Retrieve the command line arguments of the Python interpreter
                            
                                Most efficient way to remove multiple substrings from string?
                            
                                Customize location of .so file generated by Cython
                            
                                How to cope with the performance of generating signed URLs for accessing private content via CloudFront?
                            
                                In locust How to get a response from one task and pass it to other task
                            
                                np.isnan on arrays of dtype "object"
                            
                                Difference between web-based and executable installers for Python 3 on Windows
                            
                                docker python custom module not found
                            
                                Connect MySQL with Python 3.6 [closed]
                            
                                Removing cached files after a pytest run
                            
                                Write to /tmp directory in aws lambda with python
                            
                                pandas rolling window & datetime indexes: What does `offset` mean?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With