Simple Captcha Solving

Question

I'm trying to solve some simple captcha using OpenCV and pytesseract. Some of captcha samples are:

enter image description here

I tried to the remove the noisy dots with some filters:

import cv2
import numpy as np
import pytesseract

img = cv2.imread(image_path)
_, img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
img = cv2.morphologyEx(img, cv2.MORPH_OPEN, np.ones((4, 4), np.uint8), iterations=1)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.medianBlur(img, 3)
img = cv2.GaussianBlur(img, (5, 5), 0)
cv2.imwrite('res.png', img)
print(pytesseract.image_to_string('res.png'))

Resulting tranformed images are:

enter image description here

Unfortunately pytesseract just recognizes first captcha correctly. Any other better transformation?

Final Update:

As @Neil suggested, I tried to remove noise by detecting connected pixels. To find connected pixels, I found a function named connectedComponentsWithStats, whichs detect connected pixels and assigns group (component) a label. By finding connected components and removing the ones with small number of pixels, I managed to get better overall detection accuracy with pytesseract.

And here are the new resulting images:

enter image description here

Neil · Accepted Answer

I've taken a much more direct approach to filtering ink splotches from pdf documents. I won't share the whole thing it's a lot of code, but here is the general strategy I adopted:

Use Python Pillow library to get an image object where you can manipulate pixels directly.
Binarize the image.
Find all connected pixels and how many pixels are in each group of connected pixels. You can do this using the minesweeper algorithm. Which is easy to search for.
Set some threshold value of pixels that all legitimate letters are expected to have. This will be dependent on your image resolution.
replace all black pixels in groups below the threshold with white pixels.
Convert back to image.

Simple Captcha Solving

Tags:

captcha

python-tesseract

opencv-python

Mehran Torki

1 Answers

Neil

Recent Activity

Donate For Us

Simple Captcha Solving

Tags:

captcha

python-tesseract

opencv-python

Mehran Torki

1 Answers

Neil

Related questions

Recent Activity

Donate For Us