Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding contours with lines of text in OpenCV

I am writing a text recognition program, and I have a problem with sorting contours. The program works fine for one line of text, but when it comes to the whole block of text my program doesn't detect the lines of text like 80% of the time. What would be a really efficient way to extract a line of text and then all of the other lines (one at a time)?

What I want to achieve:

enter image description here

like image 834
Anže Mur Avatar asked Jun 09 '18 19:06

Anže Mur


People also ask

Can OpenCV detect text?

OpenCV package is used to read an image and perform certain image processing techniques. Python-tesseract is a wrapper for Google's Tesseract-OCR Engine which is used to recognize text from images.

How do you make contour lines with OpenCV?

To draw the contours, cv. drawContours function is used. It can also be used to draw any shape provided you have its boundary points. Its first argument is source image, second argument is the contours which should be passed as a Python list, third argument is index of contours (useful when drawing individual contour.

Which algorithm is used to detect text in images?

Optical Character Recognition (OCR) is used to analyze text in images.


1 Answers

There are a sequence of steps to achieve this:

  1. Find the optimum threshold to binarize your image. I used Otsu threshold.
  2. Find the suitable morphological operation that will form a single region along the horizontal direction. Choose a kernel that is larger in width than the height.
  3. Draw bounding boxes over the resulting contours

UPDATE

Here is the implementation:

x = 'C:/Users/Desktop/text.jpg' 

img = cv2.imread(x)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  

#--- performing Otsu threshold ---
ret,thresh1 = cv2.threshold(gray, 0, 255,cv2.THRESH_OTSU|cv2.THRESH_BINARY_INV)
cv2.imshow('thresh1', thresh1)

enter image description here

#--- choosing the right kernel
#--- kernel size of 3 rows (to join dots above letters 'i' and 'j')
#--- and 10 columns to join neighboring letters in words and neighboring words
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 3))
dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1)
cv2.imshow('dilation', dilation)

enter image description here

#---Finding contours ---
_, contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

im2 = img.copy()
for cnt in contours:
        x, y, w, h = cv2.boundingRect(cnt)
        cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('final', im2)

enter image description here

like image 138
Jeru Luke Avatar answered Oct 11 '22 15:10

Jeru Luke