Hi I am looking to improve my performance with pytesseract at digit recognition.
I take my raw image and split it into parts that look like this:
The size can vary.
To this I apply some pre-processing methods like so
image = cv2.imread(im, cv2.IMREAD_GRAYSCALE)
image = cv2.GaussianBlur(image, (1, 1), 0)
kernel = np.ones((5, 5), np.uint8)
result_img = cv2.blur(img, (2, 2), 0)
result_img = cv2.dilate(result_img, kernel, iterations=1)
result_img = cv2.erode(result_img, kernel, iterations=1)
and I get this
I then pass this to pytesseract:
num = pytesseract.image_to_string(result_img, lang='eng',
config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
However this is not good enough for me and often gets numbers wrong.
I am looking for ways to improve, I have tried to keep this minimal and self contained but let me know if I've not been clear and I will elaborate.
Thank you.
You're on the right track by trying to preprocess the image before performing OCR but using an incorrect approach. There is no reason to dilate or erode the image since these operations are mainly used for removing small noise particles. In addition, your current output is not a binary image. It may look like it only contains black and white pixels but it is actually a 3-channel BGR image which is probably why you're getting incorrect OCR results. If you look at Tesseract improve quality, you will notice that for Pytesseract to perform optimal OCR, the image needs to be preprocessed so that the desired text to detect is in black with the background in white. To do this, we can perform a Otsu's threshold
to obtain a binary image then invert it so the text is in the foreground. This will result in our preprocessed image where we can throw it into image_to_string
. We use the --psm 6
configuration option to assume a single uniform block of text. Take a look at configuration options for more settings. Here's the results:
Input image ->
Binary ->
Invert
Result from Pytesseract OCR
8
Code
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold, invert
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
invert = 255 - thresh
# OCR
data = pytesseract.image_to_string(invert, lang='eng', config='--psm 6')
print(data)
cv2.imshow('thresh', thresh)
cv2.imshow('invert', invert)
cv2.waitKey()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With