Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove all lines and borders in an image while keeping text programmatically?

I'm trying to extract text from an image using Tesseract OCR. Currently, with this original input image, the output has very poor quality (about 50%). But when I try to remove all lines and borders using photoshop, the output improves a lot (~90%). Is there any way to remove all lines and borders in an image (keeping text) programmatically using OpenCV, Imagemagick,.. or some other technique?

Original Image: Original Image

Expected Image: Expect Image

like image 957
wind Avatar asked Nov 27 '15 03:11

wind


1 Answers

Since no one has posted a complete OpenCV solution, here's a simple approach

  1. Obtain binary image. Load the image, convert to grayscale, and Otsu's threshold

  2. Remove horizontal lines. We create a horizontal shaped kernel with cv2.getStructuringElement() then find contours and remove the lines with cv2.drawContours()

  3. Remove vertical lines. We do the same operation but with a vertical shaped kernel


Load image, convert to grayscale, then Otsu's threshold to obtain a binary image

image = cv2.imread('1.png') result = image.copy() gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] 

enter image description here

Now we create a horizontal kernel to detect horizontal lines with cv2.getStructuringElement() and find contours with cv2.findContours() .To remove the horizontal lines, we use cv2.drawContours() and fill in each horizontal contour with white. This effectively "erases" the horizontal line. Here's the detected horizontal lines in green

# Remove horizontal lines horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (40,1)) remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2) cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] for c in cnts:     cv2.drawContours(result, [c], -1, (255,255,255), 5) 

enter image description here

Similarly we create a vertical kernel to remove the vertical lines, find contours, and fill each vertical contour with white. Here's the detected vertical lines highlighted in green

# Remove vertical lines vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,40)) remove_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2) cnts = cv2.findContours(remove_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] for c in cnts:     cv2.drawContours(result, [c], -1, (255,255,255), 5) 

enter image description here

After filling in both horizontal and vertical lines with white, here's our result

enter image description here


Note: Depending on the image, you may have to modify the kernel size. For instance to capture longer horizontal lines, it may be necessary to increase the horizontal kernel from (40, 1) to say (80, 1). If you wanted to detect thicker horizontal lines, then you could increase the width of the kernel to say (80, 2). In addition, you could increase the number of iterations when performing cv2.morphologyEx(). Similarly, you could modify the vertical kernels to detect more or less vertical lines. There is a trade-off when increasing or decreasing the kernel size as you may capture more or less of the lines. Again, it all varies depending on the input image

Full code for completeness

import cv2  image = cv2.imread('1.png') result = image.copy() gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]  # Remove horizontal lines horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (40,1)) remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=2) cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] for c in cnts:     cv2.drawContours(result, [c], -1, (255,255,255), 5)  # Remove vertical lines vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,40)) remove_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=2) cnts = cv2.findContours(remove_vertical, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] for c in cnts:     cv2.drawContours(result, [c], -1, (255,255,255), 5)  cv2.imshow('thresh', thresh) cv2.imshow('result', result) cv2.imwrite('result.png', result) cv2.waitKey() 
like image 192
nathancy Avatar answered Sep 21 '22 16:09

nathancy