Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining nearby bounding boxes along one axis

Here, I use Google Vision API to detect text from the following image. The red box indicates samples of a combined bounding box that I would like to obtain.

enter image description here

Basically, I get the text output and bounding box from the above image. Here, I would like to merge the bounding boxes and texts that are located along the same row (left to right). For example, the first line will get merged together:

[{'description': 'บริษัทไปรษณีย์ไทย',
  'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
 {'description': 'จํากัด',
 'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
 ...

to

[{'description': 'บริษัทไปรษณีย์ไทยจำกัด',
  'vertices': [(528, 202), (809, 202), (809, 222), (528, 222)]},
 ...

These following rows

 {'description': 'RP',
  'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
 {'description': '8147',
  'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
 {'description': '3609',
  'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
 {'description': '7',
  'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]},
 {'description': 'TH',

will get merged together.

Current approach

I looked into - Solution using OpenCV - Non-max suppression algorithm

but cannot produce one specific for my need since it relies on percentage of overlapping pixels. If someone can help, that would be great!

Please try to use bounding box data here: https://gist.github.com/titipata/fd44572f7f6c3cc1dfbac05fb86f6081

like image 266
titipata Avatar asked May 18 '20 16:05

titipata


People also ask

How do you merge nearby bounding boxes?

If this distance is lower than a threshold you create a new bounding box that has the coordinates of left-top point with the lower values of x and y of the two boxes and the coordinates of the right-bottom point with the highest values of x and y of the two boxes.

How do you combine two bounding boxes in Matlab?

Increase the size of bounding boxes, make them overlap and try to merge. change the valuse expansionAmount in your code. By the way you totally copied the exaple code from matlab. And only use either regionprops or stoke width variation to remove non text region , the code you copied using both.


1 Answers

input:

out = [{'description': 'บริษัทไปรษณีย์ไทย',
  'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
 {'description': 'จํากัด',
 'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
 {'description': 'RP',
  'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
 {'description': '8147',
  'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
 {'description': '3609',
  'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
 {'description': '7',
  'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]}
]
  • I assumed, the 4 tuples represent x, y co-ordinates of the upper left, upper right, lower right and lower left co-ordinates respectively (in order).

  • First, we need to find all the bbox pairs which are close in the x direction, and almost same in the y direction (location at same height). N.B: You may need to tune the two thresholds if something is missed.

import numpy as np

pairs = []

threshold_y = 4 # height threshold
threshold_x = 20 # x threshold

for i in range(len(out)):
    for j in range(i+1, len(out)):
        left_upi, right_upi, right_lowi, left_lowi = out[i]['vertices']
        left_upj, right_upj, right_lowj, left_lowj = out[j]['vertices']
        # first of all, they should be in the same height range, starting Y axis should be almost same
        # their starting x axis is close upto a threshold
        cond1 = (abs(left_upi[1] - left_upj[1]) < threshold_y)
        cond2 = (abs(right_upi[0] - left_upj[0]) < threshold_x)
        cond3 = (abs(right_upj[0] - left_upi[0]) < threshold_x)

        if cond1 and (cond2 or cond3):
            pairs.append([i,j])

out:

pairs
[[0, 1], [2, 3], [3, 4], [4, 5]]
  • Now, we just have the pairs, but we need to find all the connected components too, for example, we know 0, 1 are in one component, and 2, 3, 4, 5 are in another component. (Usually, graph algorithms are most suitable for this task, but to make things simple, I did an iterative search)
merged_pairs = []

for i in range(len(pairs)):
    cur_set = set()
    p = pairs[i]

    done = False
    for k in range(len(merged_pairs)):
        if p[0] in merged_pairs[k]:
            merged_pairs[k].append(p[1])
            done = True
        if p[1] in merged_pairs[k]:
            merged_pairs[k].append(p[0])
            done = True

    if done:
        continue

    cur_set.add(p[0])
    cur_set.add(p[1])

    match_idx = []
    while True:
        num_match = 0
        for j in range(i+1, len(pairs)):
            p2 = pairs[j]

            if j not in match_idx and (p2[0] in cur_set or p2[1] in cur_set):
                cur_set.add(p2[0])
                cur_set.add(p2[1])
                num_match += 1
                match_idx.append(j)

        if num_match == 0:
            break
    merged_pairs.append(list(cur_set))

merged_pairs = [list(set(a)) for a in merged_pairs]

out:

merged_pairs
[[0, 1], [2, 3, 4, 5]]

Alternative networkx solution:

(much shorter if you don't mind using additional networkx import)

import networkx as nx

g = nx.Graph()
g.add_edges_from([[0, 1], [2, 3], [3, 4], [4, 5]]) # pass pairs here

gs = [list(a) for a in list(nx.connected_components(g))] # get merged pairs here
print(gs)

[[0, 1], [2, 3, 4, 5]]

  • Now, we have all the connected components, we can sort them based on their starting x co-ordinates, and merge the bounding boxes.
# for connected components, sort them according to x-axis and merge

out_final = []

INF = 999999999 # a large number greater than any co-ordinate
for idxs in merged_pairs:
    c_bbox = []

    for i in idxs:
        c_bbox.append(out[i])

    sorted_x = sorted(c_bbox, key =  lambda x: x['vertices'][0][0])

    new_sol = {}
    new_sol['description'] = ''
    new_sol['vertices'] = [[INF, INF], [-INF, INF], [-INF, -INF], [INF, -INF]]
    for k in sorted_x:
        new_sol['description'] += k['description']

        new_sol['vertices'][0][0] = min(new_sol['vertices'][0][0], k['vertices'][0][0])
        new_sol['vertices'][0][1] = min(new_sol['vertices'][0][1], k['vertices'][0][1])

        new_sol['vertices'][1][0] = max(new_sol['vertices'][1][0], k['vertices'][1][0])
        new_sol['vertices'][1][1] = min(new_sol['vertices'][1][1], k['vertices'][1][1])


        new_sol['vertices'][2][0] = max(new_sol['vertices'][2][0], k['vertices'][2][0])
        new_sol['vertices'][2][1] = max(new_sol['vertices'][2][1], k['vertices'][2][1])        

        new_sol['vertices'][3][0] = min(new_sol['vertices'][3][0], k['vertices'][3][0])
        new_sol['vertices'][3][1] = max(new_sol['vertices'][3][1], k['vertices'][3][1])  


    out_final.append(new_sol)

final_out:

out_final
[{'description': 'บริษัทไปรษณีย์ไทยจํากัด',
  'vertices': [[528, 202], [809, 202], [809, 222], [528, 222]]},
 {'description': 'RP814736097',
  'vertices': [[729, 1072], [911, 1072], [911, 1093], [729, 1093]]}]
like image 174
Zabir Al Nazi Avatar answered Oct 09 '22 01:10

Zabir Al Nazi