Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove background text and noise from an image using image processing with OpenCV

I have these images

enter image description here

enter image description here

For which I want to remove the text in the background. Only the captcha characters should remain(i.e K6PwKA, YabVzu). The task is to identify these characters later using tesseract.

This is what I have tried, but it isn't giving much good accuracy.

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Users\HPO2KOR\AppData\Local\Tesseract-OCR\tesseract.exe"
img = cv2.imread("untitled.png")
gray_image = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_filtered = cv2.inRange(gray_image, 0, 75)
cv2.imwrite("cleaned.png", gray_filtered)

How can I improve the same?

Note : I tried all the suggestion that I was getting for this question and none of them worked for me.

EDIT : According to Elias, I tried finding the color of the captcha text using photoshop by converting it to grayscale which came out to be somewhere in between [100, 105]. I then threshold the image based on this range. But the result which I got did not give satisfactory result from tesseract.

gray_filtered = cv2.inRange(gray_image, 100, 105)
cv2.imwrite("cleaned.png", gray_filtered)
gray_inv = ~gray_filtered
cv2.imwrite("cleaned.png", gray_inv)
data = pytesseract.image_to_string(gray_inv, lang='eng')

Output :

'KEP wKA'

Result :

enter image description here

EDIT 2 :

def get_text(img_name):
    lower = (100, 100, 100)
    upper = (104, 104, 104) 
    img = cv2.imread(img_name)
    img_rgb_inrange = cv2.inRange(img, lower, upper)
    neg_rgb_image = ~img_rgb_inrange
    cv2.imwrite('neg_img_rgb_inrange.png', neg_rgb_image)
    data = pytesseract.image_to_string(neg_rgb_image, lang='eng')
    return data

gives :

enter image description here

and the text as

GXuMuUZ

Is there any way to soften it a little

like image 893
Himanshu Poddar Avatar asked Feb 10 '20 06:02

Himanshu Poddar


People also ask

How can I remove background noise from a picture?

Reduce noise from your photos With your photo selected, click the Edit icon. Open the Detail panel to reveal the Noise Reduction slider. Before you make any adjustments click the 1:1 icon in the toolbar, or click on the photo to zoom into the actual size of the image.

How do you remove salt and pepper noise from OpenCV?

Median Filtering is very effective at eliminating salt and pepper noise, and preserving edges in an image after filtering out noise. The implementation of median filtering is very straightforward. Load the image, pass it through cv2.


1 Answers

Didn't try , but this might work. step 1: use ps to find out what color the captcha characters are. For excample, "YabVzu" is (128,128,128),

Step 2: Use pillow's method getdata()/getcolor(), it will return a sequence which contain the colour of every pixel.

then ,we project every item in the sequence to the original captcha image.

hence we know the positon of every pixel in the image.

Step 3: find all pixels whose colour with the most approximate values to (128,128,128). You may set a threshold to control the accuracy. this step return another sequence. Lets annotate it as Seq a

Step 4: generate a picture with the very same height and width as the original one. plot every pixel in [Seq a] in the very excat position in the picture. Here,we will get a cleaned training items

Step 5: Use a Keras project to break the code. And the precission should be over 72%.

like image 169
Elias Avatar answered Oct 08 '22 03:10

Elias