I am using MSER to identify text regions in MSER. I am using the following code to extract the regions and save them as an image. Currently, each identified region is saved as a separate image. But, I want to merge regions belonging to a line of text merged as a single image.
import cv2
img = cv2.imread('newF.png')
mser = cv2.MSER_create()
img = cv2.resize(img, (img.shape[1]*2, img.shape[0]*2))
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
vis = img.copy()
regions = mser.detectRegions(gray)
hulls = [cv2.convexHull(p.reshape(-1, 1, 2)) for p in regions[0]]
cv2.polylines(vis, hulls, 1, (0,255,0))
How can I stitch the images that belong to a single line together? I get the logic to do will mostly be based on some heuristic for identifying areas with nearby y-coordinates.
But how exactly the regions can be merged in OpenCV. I am missing out on this as I am new to openCV. Any help would be appreciated.
Attaching a sample image
The desired output(s) is as follows
Another line
Another Line
If you are particular about using MSER, then, as you mentioned, a heuristic for combining areas with nearby y-coordinates can be used. The following approach might not be efficient, and I will try and optimize it, but it might give you an idea about how to tackle the problem.
First, let us plot all the bboxes determined by MSER:
coordinates, bboxes = mser.detectRegions(gray)
for bbox in bboxes:
x, y, w, h = bbox
cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
This gives us -
Now, it is evident from the bboxes, that the heights are varying quite a lot, even in a single line. Thus, for clustering bounding bboxes in a single line, we would have to come up with an interval. I couldn't come up with something foolproof, so I went with half the median of all the heights of the given bboxes, which works well for the given case.
bboxes_list = list()
heights = list()
for bbox in bboxes:
x, y, w, h = bbox
bboxes_list.append([x, y, x + w, y + h]) # Create list of bounding boxes, with each bbox containing the left-top and right-bottom coordinates
heights.append(h)
heights = sorted(heights) # Sort heights
median_height = heights[len(heights) / 2] / 2 # Find half of the median height
Now, to group the bounding boxes, given a particular interval for the y-coordinates ( Here, the median height ), I am modifying a snippet that I had once found on stackoverflow ( I will add the source once I find it ). This function takes in a list, along with a specific interval as input, and returns a list of groups, where each group contains bounding boxes whose absolute difference in y-coordinates is less than or equal to the interval. Please note that the iterable / list needs to be sorted based on y-coordinate.
def grouper(iterable, interval=2):
prev = None
group = []
for item in iterable:
if not prev or abs(item[1] - prev[1]) <= interval:
group.append(item)
else:
yield group
group = [item]
prev = item
if group:
yield group
Thus, before grouping the bounding boxes, they need to be sorted based on the y-coordinate. After grouping, we iterate through each group, and determine the min x-coordinate, min y-coordinate, max x-coordinate, and max y-coordinate required to draw a bounding box that covers all the bounding boxes in a given group.
bboxes_list = sorted(bbox_mod, key=lambda k: k[1]) # Sort the bounding boxes based on y1 coordinate ( y of the left-top coordinate )
combined_bboxes = grouper(bboxes_list, median_height) # Group the bounding boxes
for group in combined_bboxes:
x_min = min(group, key=lambda k: k[0])[0] # Find min of x1
x_max = max(group, key=lambda k: k[2])[2] # Find max of x2
y_min = min(group, key=lambda k: k[1])[1] # Find min of y1
y_max = max(group, key=lambda k: k[3])[3] # Find max of y2
cv2.rectangle(img, (x_min, y_min), (x_max, y_max), (0, 255, 0), 2)
Final resultant image -
Again, I would like to re-iterate the fact that their might be ways to optimize this approach further. The goal is to give you an idea about how such problems can be tackled.
Maybe even something as primitive as dilate-erode could be made work in your case? For example, if I use erode
operation followed by dilate
operation on your original image, and mostly in horizontal direction, e. g.:
img = cv2.erode(img, np.ones((1, 20)))
img = cv2.dilate(img, np.ones((1, 22)))
the result is something like:
So if we draw that over the original image, it becomes:
I didn't resize the original image as you do (probably to detect those small separate dots and stuff). Not ideal (I don't know how MSER works), but with enough tweaking maybe you could even use simple detection of connected components with this?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With