How to extract only characters from image?

Question

I have this type of image from that I only want to extract the characters.

enter image description here

After binarization, I am getting this image

img = cv2.imread('the_image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, 9)

enter image description here

Then find contours on this image.

(im2, cnts, _) = cv2.findContours(thresh.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key=cv2.contourArea, reverse=True)
for contour in cnts[:2000]:
    x, y, w, h = cv2.boundingRect(contour)
    aspect_ratio = h/w
    area = cv2.contourArea(contour)
    cv2.drawContours(img, [contour], -1, (0, 255, 0), 2)

I am getting

enter image description here

I need a way to filter the contours so that it selects only the characters. So I can find the bounding boxes and extract roi.

I can find contours and filter them based on the size of areas, but the resolution of the source images are not consistent. These images are taken from mobile cameras.

Also as the borders of the boxes are disconnected. I can't accurately detect the boxes.

Edit:

If I deselect boxes which has an aspect ratio less than 0.4. Then it works up to some extent. But I don't know if it will work or not for different resolution of images.

for contour in cnts[:2000]:
    x, y, w, h = cv2.boundingRect(contour)
    aspect_ratio = h/w
    area = cv2.contourArea(contour)

    if aspect_ratio < 0.4:
        continue
    print(aspect_ratio)
    cv2.drawContours(img, [contour], -1, (0, 255, 0), 2)

enter image description here

lucians · Accepted Answer

Not so difficult...

import cv2

img = cv2.imread('img.jpg')

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('gray', gray)

ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU)
cv2.imshow('thresh', thresh)

im2, ctrs, hier = cv2.findContours(thresh.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
sorted_ctrs = sorted(ctrs, key=lambda ctr: cv2.boundingRect(ctr)[0])

for i, ctr in enumerate(sorted_ctrs):
    x, y, w, h = cv2.boundingRect(ctr)

    roi = img[y:y + h, x:x + w]

    area = w*h

    if 250 < area < 900:
        rect = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
        cv2.imshow('rect', rect)

cv2.waitKey(0)

Result

res

You can tweak the code like you want (here it can save ROI using original image; for eventually OCR recognition you have to save them in binary format - better methods than sorting by area are available)

Source: Extract ROI from image with Python and OpenCV and some of my knowledge.

Just kidding, take a look at my questions/answers.

How to extract only characters from image?

Tags:

python

image-processing

opencv

computer-vision

mnist

Arka

1 Answers

lucians

Recent Activity

Donate For Us

How to extract only characters from image?

Tags:

python

image-processing

opencv

computer-vision

mnist

Arka

1 Answers

lucians

Related questions

Recent Activity

Donate For Us