Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging regions in MSER for identifying text lines in OCR

I am using MSER to identify text regions in MSER. I am using the following code to extract the regions and save them as an image. Currently, each identified region is saved as a separate image. But, I want to merge regions belonging to a line of text merged as a single image.

import cv2

img = cv2.imread('newF.png')
mser = cv2.MSER_create()


img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()

regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0)) 

How can I stitch the images that belong to a single line together? I get the logic to do will mostly be based on some heuristic for identifying areas with nearby y-coordinates.

But how exactly the regions can be merged in OpenCV. I am missing out on this as I am new to openCV. Any help would be appreciated.

Attaching a sample image enter image description here

The desired output(s) is as follows enter image description here

Another line enter image description here

Another Line enter image description here

like image 833
Amrith Krishna Avatar asked Feb 05 '18 04:02

Amrith Krishna


2 Answers

If you are particular about using MSER, then, as you mentioned, a heuristic for combining areas with nearby y-coordinates can be used. The following approach might not be efficient, and I will try and optimize it, but it might give you an idea about how to tackle the problem.

  1. First, let us plot all the bboxes determined by MSER:

    coordinates, bboxes = mser.detectRegions(gray)
    for bbox in bboxes:
        x, y, w, h = bbox
        cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
    

    This gives us - MSER Detected bboxes

  2. Now, it is evident from the bboxes, that the heights are varying quite a lot, even in a single line. Thus, for clustering bounding bboxes in a single line, we would have to come up with an interval. I couldn't come up with something foolproof, so I went with half the median of all the heights of the given bboxes, which works well for the given case.

    bboxes_list = list()
    heights = list()
    for bbox in bboxes:
        x, y, w, h = bbox
        bboxes_list.append([x, y, x + w, y + h])  # Create list of bounding boxes, with each bbox containing the left-top and right-bottom coordinates
        heights.append(h)
    heights = sorted(heights)  # Sort heights
    median_height = heights[len(heights) / 2] / 2  # Find half of the median height
    
  3. Now, to group the bounding boxes, given a particular interval for the y-coordinates ( Here, the median height ), I am modifying a snippet that I had once found on stackoverflow ( I will add the source once I find it ). This function takes in a list, along with a specific interval as input, and returns a list of groups, where each group contains bounding boxes whose absolute difference in y-coordinates is less than or equal to the interval. Please note that the iterable / list needs to be sorted based on y-coordinate.

    def grouper(iterable, interval=2):
        prev = None
        group = []
        for item in iterable:
            if not prev or abs(item[1] - prev[1]) <= interval:
                group.append(item)
            else:
                yield group
                group = [item]
            prev = item
        if group:
            yield group
    
  4. Thus, before grouping the bounding boxes, they need to be sorted based on the y-coordinate. After grouping, we iterate through each group, and determine the min x-coordinate, min y-coordinate, max x-coordinate, and max y-coordinate required to draw a bounding box that covers all the bounding boxes in a given group.

    bboxes_list = sorted(bbox_mod, key=lambda k: k[1])  # Sort the bounding boxes based on y1 coordinate ( y of the left-top coordinate )
    combined_bboxes = grouper(bboxes_list, median_height)  # Group the bounding boxes
    for group in combined_bboxes:
        x_min = min(group, key=lambda k: k[0])[0]  # Find min of x1
        x_max = max(group, key=lambda k: k[2])[2]  # Find max of x2
        y_min = min(group, key=lambda k: k[1])[1]  # Find min of y1
        y_max = max(group, key=lambda k: k[3])[3]  # Find max of y2
        cv2.rectangle(img, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
    

    Final resultant image -

    Lines_combined

Again, I would like to re-iterate the fact that their might be ways to optimize this approach further. The goal is to give you an idea about how such problems can be tackled.

like image 93
GaneshTata Avatar answered Nov 17 '22 02:11

GaneshTata


Maybe even something as primitive as dilate-erode could be made work in your case? For example, if I use erode operation followed by dilate operation on your original image, and mostly in horizontal direction, e. g.:

img = cv2.erode(img, np.ones((1, 20)))
img = cv2.dilate(img, np.ones((1, 22)))

the result is something like:

enter image description here

So if we draw that over the original image, it becomes:

enter image description here

I didn't resize the original image as you do (probably to detect those small separate dots and stuff). Not ideal (I don't know how MSER works), but with enough tweaking maybe you could even use simple detection of connected components with this?

like image 41
Headcrab Avatar answered Nov 17 '22 04:11

Headcrab