How to extract white region in an image

Tags:

I have a sample image like this

enter image description here

I'm looking for a way to black out the noise from the image such that I end up with an image that just has black text on white background so that I may send it to tesseract.

I've tried morphing with

kernel = np.ones((4,4),np.uint8)
opening = cv2.morphologyEx(img, cv2.MORPH_OPEN, kernel)
cv2.imshow("opening", opening)

but it doesn't seem to work.

I've also tried to find contours

img = cv2.cvtColor(rotated, cv2.COLOR_BGR2GRAY)
(cnts, _) = cv2.findContours(img, cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted(cnts, key = cv2.contourArea, reverse = True)[:1]
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    roi=rotated[y:y+h,x:x+w].copy()
    cv2.imwrite("roi.png", roi)

With the above code, I get the following contours:

enter image description here

which leads to this image when cropped:

enter image description here

which is still not good enough. I want black text on white background, so that I can send it to tesseract OCR and have good success rate.

Is there anything else I can try?

Update

Here is an additional similar image. This one is a bit easier because it has a smooth rectangle in it

enter image description here

336

asked Oct 05 '15 05:10

Anthony

1 Answers

The following works for your given example, although it might need tweaking for a wider range of images.

import numpy as np
import cv2

image_src = cv2.imread("input.png")
gray = cv2.cvtColor(image_src, cv2.COLOR_BGR2GRAY)
ret, gray = cv2.threshold(gray, 250,255,0)

image, contours, hierarchy = cv2.findContours(gray, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
largest_area = sorted(contours, key=cv2.contourArea)[-1]
mask = np.zeros(image_src.shape, np.uint8)
cv2.drawContours(mask, [largest_area], 0, (255,255,255,255), -1)
dst = cv2.bitwise_and(image_src, mask)
mask = 255 - mask
roi = cv2.add(dst, mask)

roi_gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY)
ret, gray = cv2.threshold(roi_gray, 250,255,0)
image, contours, hierarchy = cv2.findContours(gray, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)

max_x = 0
max_y = 0
min_x = image_src.shape[1]
min_y = image_src.shape[0]

for c in contours:
    if 150 < cv2.contourArea(c) < 100000:
        x, y, w, h = cv2.boundingRect(c)
        min_x = min(x, min_x)
        min_y = min(y, min_y)
        max_x = max(x+w, max_x)
        max_y = max(y+h, max_y)

roi = roi[min_y:max_y, min_x:max_x]
cv2.imwrite("roi.png", roi)

Giving you the following type of output images:

enter image description here

And...

enter image description here

The code works by first locating the largest contour area. From this a mask is created which is used to first select only the area inside, i.e. the text. The inverse of the mask is then added to the image to convert the area outside the mask to white.

Lastly contours are found again for this new image. Any contour areas outside a suitable size range are discarded (this is used to ignore any small noise areas), and a bounding rect is found for each. With each of these rectangles, an outer bounding rect is calculated for all of the remaining contours, and a crop is made using these values to give the final image.

Update - To get the remainder of the image, i.e. with the above area removed, the following could be used:

image_src = cv2.imread("input.png")
gray = cv2.cvtColor(image_src, cv2.COLOR_BGR2GRAY)
ret, gray = cv2.threshold(gray, 10, 255,0)
image, contours, hierarchy = cv2.findContours(gray, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
largest_area = sorted(contours, key=cv2.contourArea)[-1]
mask = np.zeros(image_src.shape, np.uint8)
cv2.drawContours(mask, [largest_area], 0, (255,255,255,255), -1)
image_remainder = cv2.bitwise_and(image_src, 255 - mask)

cv2.imwrite("remainder.png", image_remainder)

126

answered Oct 21 '22 23:10

Martin Evans

Related questions
                            
                                Is there a way to prevent two Python programs from executing the same binary at the same time?
                            
                                How to add attribute to python *class* that is _not_ inherited?
                            
                                Python 2 list comprehension and eval
                            
                                if __name__ == '__main__' not working ipython
                            
                                Poisson Point Process in Python 3 with numpy, without scipy
                            
                                how to make an exception for broken pipe errors on flask, when the client disconnects prematurely?
                            
                                Asynchronous RabbitMQ consumer with aioamqp
                            
                                Tab completion in ipython for list elements
                            
                                How to add punctuation to text using python?
                            
                                How to prevent overwritting Python Built-in Function by accident?
                            
                                Anaconda Python: How to install missing dependency?
                            
                                Group By in mongoengine EmbeddedDocumentListField
                            
                                python pandas: how to avoid chained assignment
                            
                                get mask from contour with OpenCV
                            
                                How to change window sizes/dimensions via Python
                            
                                SQLAlchemy correlated update for multiple columns
                            
                                IPython: Configure Base Url Path for All Request
                            
                                Create scipy curve fitting definitions for fourier series dynamically
                            
                                Why does isinstance([1, 2, 3], List[str]) evaluate to true?
                            
                                pickling scipy interp1d spline

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract white region in an image

Tags:

python

image-processing

opencv

computer-vision

Anthony

People also ask

1 Answers

Martin Evans

Recent Activity

Donate For Us