Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve OCR with Pytesseract text recognition?

Tags:

Hi I am looking to improve my performance with pytesseract at digit recognition.

I take my raw image and split it into parts that look like this:

Image1

The size can vary.

To this I apply some pre-processing methods like so

image = cv2.imread(im, cv2.IMREAD_GRAYSCALE)
image = cv2.GaussianBlur(image, (1, 1), 0)
kernel = np.ones((5, 5), np.uint8)
result_img = cv2.blur(img, (2, 2), 0)
result_img = cv2.dilate(result_img, kernel, iterations=1)
result_img = cv2.erode(result_img, kernel, iterations=1)

and I get this

Image2

I then pass this to pytesseract:

num = pytesseract.image_to_string(result_img, lang='eng',
                                     config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')

However this is not good enough for me and often gets numbers wrong.

I am looking for ways to improve, I have tried to keep this minimal and self contained but let me know if I've not been clear and I will elaborate.

Thank you.

like image 664
tepsupek Avatar asked Mar 10 '20 18:03

tepsupek


1 Answers

You're on the right track by trying to preprocess the image before performing OCR but using an incorrect approach. There is no reason to dilate or erode the image since these operations are mainly used for removing small noise particles. In addition, your current output is not a binary image. It may look like it only contains black and white pixels but it is actually a 3-channel BGR image which is probably why you're getting incorrect OCR results. If you look at Tesseract improve quality, you will notice that for Pytesseract to perform optimal OCR, the image needs to be preprocessed so that the desired text to detect is in black with the background in white. To do this, we can perform a Otsu's threshold to obtain a binary image then invert it so the text is in the foreground. This will result in our preprocessed image where we can throw it into image_to_string. We use the --psm 6 configuration option to assume a single uniform block of text. Take a look at configuration options for more settings. Here's the results:

Input image -> Binary -> Invert

enter image description here enter image description here enter image description here

Result from Pytesseract OCR

8

Code

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image, grayscale, Otsu's threshold, invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh

# OCR
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()
like image 99
nathancy Avatar answered Oct 13 '22 21:10

nathancy