Merging regions in MSER for identifying text lines in OCR

Question

I am using MSER to identify text regions in MSER. I am using the following code to extract the regions and save them as an image. Currently, each identified region is saved as a separate image. But, I want to merge regions belonging to a line of text merged as a single image.

import cv2

img = cv2.imread('newF.png')
mser = cv2.MSER_create()


img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()

regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))

How can I stitch the images that belong to a single line together? I get the logic to do will mostly be based on some heuristic for identifying areas with nearby y-coordinates.

But how exactly the regions can be merged in OpenCV. I am missing out on this as I am new to openCV. Any help would be appreciated.

Attaching a sample image enter image description here

The desired output(s) is as follows enter image description here

Another line enter image description here

Another Line enter image description here

GaneshTata · Accepted Answer

If you are particular about using MSER, then, as you mentioned, a heuristic for combining areas with nearby y-coordinates can be used. The following approach might not be efficient, and I will try and optimize it, but it might give you an idea about how to tackle the problem.

First, let us plot all the bboxes determined by MSER:

coordinates, bboxes = mser.detectRegions(gray)
for bbox in bboxes:
    x, y, w, h = bbox
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

This gives us - MSER Detected bboxes

Now, it is evident from the bboxes, that the heights are varying quite a lot, even in a single line. Thus, for clustering bounding bboxes in a single line, we would have to come up with an interval. I couldn't come up with something foolproof, so I went with half the median of all the heights of the given bboxes, which works well for the given case.

bboxes_list = list()
heights = list()
for bbox in bboxes:
    x, y, w, h = bbox
    bboxes_list.append([x, y, x + w, y + h])  # Create list of bounding boxes, with each bbox containing the left-top and right-bottom coordinates
    heights.append(h)
heights = sorted(heights)  # Sort heights
median_height = heights[len(heights) / 2] / 2  # Find half of the median height

Now, to group the bounding boxes, given a particular interval for the y-coordinates ( Here, the median height ), I am modifying a snippet that I had once found on stackoverflow ( I will add the source once I find it ). This function takes in a list, along with a specific interval as input, and returns a list of groups, where each group contains bounding boxes whose absolute difference in y-coordinates is less than or equal to the interval. Please note that the iterable / list needs to be sorted based on y-coordinate.
```
def grouper(iterable, interval=2):
    prev = None
    group = []
    for item in iterable:
        if not prev or abs(item[1] - prev[1]) <= interval:
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group
```

Thus, before grouping the bounding boxes, they need to be sorted based on the y-coordinate. After grouping, we iterate through each group, and determine the min x-coordinate, min y-coordinate, max x-coordinate, and max y-coordinate required to draw a bounding box that covers all the bounding boxes in a given group.

bboxes_list = sorted(bbox_mod, key=lambda k: k[1])  # Sort the bounding boxes based on y1 coordinate ( y of the left-top coordinate )
combined_bboxes = grouper(bboxes_list, median_height)  # Group the bounding boxes
for group in combined_bboxes:
    x_min = min(group, key=lambda k: k[0])[0]  # Find min of x1
    x_max = max(group, key=lambda k: k[2])[2]  # Find max of x2
    y_min = min(group, key=lambda k: k[1])[1]  # Find min of y1
    y_max = max(group, key=lambda k: k[3])[3]  # Find max of y2
    cv2.rectangle(img, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)

Final resultant image -

Lines_combined

Again, I would like to re-iterate the fact that their might be ways to optimize this approach further. The goal is to give you an idea about how such problems can be tackled.

Headcrab · Answer

Maybe even something as primitive as dilate-erode could be made work in your case? For example, if I use erode operation followed by dilate operation on your original image, and mostly in horizontal direction, e. g.:

img = cv2.erode(img, np.ones((1, 20)))
img = cv2.dilate(img, np.ones((1, 22)))

the result is something like:

enter image description here

So if we draw that over the original image, it becomes:

enter image description here

I didn't resize the original image as you do (probably to detect those small separate dots and stuff). Not ideal (I don't know how MSER works), but with enough tweaking maybe you could even use simple detection of connected components with this?

Merging regions in MSER for identifying text lines in OCR

Tags:

python

opencv

bounding-box

mser

image-stitching

Amrith Krishna

2 Answers

GaneshTata

Headcrab

Recent Activity

Donate For Us

Merging regions in MSER for identifying text lines in OCR

Tags:

python

opencv

bounding-box

mser

image-stitching

Amrith Krishna

2 Answers

GaneshTata

Headcrab

Related questions

Recent Activity

Donate For Us