Combining nearby bounding boxes along one axis

Tags:

Here, I use Google Vision API to detect text from the following image. The red box indicates samples of a combined bounding box that I would like to obtain.

enter image description here

Basically, I get the text output and bounding box from the above image. Here, I would like to merge the bounding boxes and texts that are located along the same row (left to right). For example, the first line will get merged together:

[{'description': 'บริษัทไปรษณีย์ไทย',
  'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
 {'description': 'จํากัด',
 'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
 ...

[{'description': 'บริษัทไปรษณีย์ไทยจำกัด',
  'vertices': [(528, 202), (809, 202), (809, 222), (528, 222)]},
 ...

These following rows

 {'description': 'RP',
  'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
 {'description': '8147',
  'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
 {'description': '3609',
  'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
 {'description': '7',
  'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]},
 {'description': 'TH',

will get merged together.

Current approach

I looked into - Solution using OpenCV - Non-max suppression algorithm

but cannot produce one specific for my need since it relies on percentage of overlapping pixels. If someone can help, that would be great!

Please try to use bounding box data here: https://gist.github.com/titipata/fd44572f7f6c3cc1dfbac05fb86f6081

266

asked May 18 '20 16:05

titipata

1 Answers

input:

out = [{'description': 'บริษัทไปรษณีย์ไทย',
  'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
 {'description': 'จํากัด',
 'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
 {'description': 'RP',
  'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
 {'description': '8147',
  'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
 {'description': '3609',
  'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
 {'description': '7',
  'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]}
]

I assumed, the 4 tuples represent x, y co-ordinates of the upper left, upper right, lower right and lower left co-ordinates respectively (in order).
First, we need to find all the bbox pairs which are close in the x direction, and almost same in the y direction (location at same height). N.B: You may need to tune the two thresholds if something is missed.

import numpy as np

pairs = []

threshold_y = 4 # height threshold
threshold_x = 20 # x threshold

for i in range(len(out)):
    for j in range(i+1, len(out)):
        left_upi, right_upi, right_lowi, left_lowi = out[i]['vertices']
        left_upj, right_upj, right_lowj, left_lowj = out[j]['vertices']
        # first of all, they should be in the same height range, starting Y axis should be almost same
        # their starting x axis is close upto a threshold
        cond1 = (abs(left_upi[1] - left_upj[1]) < threshold_y)
        cond2 = (abs(right_upi[0] - left_upj[0]) < threshold_x)
        cond3 = (abs(right_upj[0] - left_upi[0]) < threshold_x)

        if cond1 and (cond2 or cond3):
            pairs.append([i,j])

out:

pairs
[[0, 1], [2, 3], [3, 4], [4, 5]]

Now, we just have the pairs, but we need to find all the connected components too, for example, we know 0, 1 are in one component, and 2, 3, 4, 5 are in another component. (Usually, graph algorithms are most suitable for this task, but to make things simple, I did an iterative search)

merged_pairs = []

for i in range(len(pairs)):
    cur_set = set()
    p = pairs[i]

    done = False
    for k in range(len(merged_pairs)):
        if p[0] in merged_pairs[k]:
            merged_pairs[k].append(p[1])
            done = True
        if p[1] in merged_pairs[k]:
            merged_pairs[k].append(p[0])
            done = True

    if done:
        continue

    cur_set.add(p[0])
    cur_set.add(p[1])

    match_idx = []
    while True:
        num_match = 0
        for j in range(i+1, len(pairs)):
            p2 = pairs[j]

            if j not in match_idx and (p2[0] in cur_set or p2[1] in cur_set):
                cur_set.add(p2[0])
                cur_set.add(p2[1])
                num_match += 1
                match_idx.append(j)

        if num_match == 0:
            break
    merged_pairs.append(list(cur_set))

merged_pairs = [list(set(a)) for a in merged_pairs]

out:

merged_pairs
[[0, 1], [2, 3, 4, 5]]

Alternative networkx solution:

(much shorter if you don't mind using additional networkx import)

import networkx as nx

g = nx.Graph()
g.add_edges_from([[0, 1], [2, 3], [3, 4], [4, 5]]) # pass pairs here

gs = [list(a) for a in list(nx.connected_components(g))] # get merged pairs here
print(gs)

[[0, 1], [2, 3, 4, 5]]

Now, we have all the connected components, we can sort them based on their starting x co-ordinates, and merge the bounding boxes.

# for connected components, sort them according to x-axis and merge

out_final = []

INF = 999999999 # a large number greater than any co-ordinate
for idxs in merged_pairs:
    c_bbox = []

    for i in idxs:
        c_bbox.append(out[i])

    sorted_x = sorted(c_bbox, key =  lambda x: x['vertices'][0][0])

    new_sol = {}
    new_sol['description'] = ''
    new_sol['vertices'] = [[INF, INF], [-INF, INF], [-INF, -INF], [INF, -INF]]
    for k in sorted_x:
        new_sol['description'] += k['description']

        new_sol['vertices'][0][0] = min(new_sol['vertices'][0][0], k['vertices'][0][0])
        new_sol['vertices'][0][1] = min(new_sol['vertices'][0][1], k['vertices'][0][1])

        new_sol['vertices'][1][0] = max(new_sol['vertices'][1][0], k['vertices'][1][0])
        new_sol['vertices'][1][1] = min(new_sol['vertices'][1][1], k['vertices'][1][1])


        new_sol['vertices'][2][0] = max(new_sol['vertices'][2][0], k['vertices'][2][0])
        new_sol['vertices'][2][1] = max(new_sol['vertices'][2][1], k['vertices'][2][1])        

        new_sol['vertices'][3][0] = min(new_sol['vertices'][3][0], k['vertices'][3][0])
        new_sol['vertices'][3][1] = max(new_sol['vertices'][3][1], k['vertices'][3][1])  


    out_final.append(new_sol)

final_out:

out_final
[{'description': 'บริษัทไปรษณีย์ไทยจํากัด',
  'vertices': [[528, 202], [809, 202], [809, 222], [528, 222]]},
 {'description': 'RP814736097',
  'vertices': [[729, 1072], [911, 1072], [911, 1093], [729, 1093]]}]

174

answered Oct 09 '22 01:10

Zabir Al Nazi

Related questions
                            
                                Right align the rowname_col in gt
                            
                                Matplotlib - Combine text/annotation coordinate systems
                            
                                Unable to look up TXT record for host Cluster0-XXXXX.mongodb.net
                            
                                Swift Combine how Set<AnyCancellable> works?
                            
                                Wrong position of <video> element inside SVG foreignObject on Safari
                            
                                What is the difference between Threads.@spawn and Threads.@threads?
                            
                                Pearsonr: TypeError: No loop matching the specified signature and casting was found for ufunc add
                            
                                How to get branch name when running pipeline on tag?
                            
                                How to limit and skip after grouping in a GraphQL query to paginate the tags list page?
                            
                                How can I get only rgb of plt.imread?
                            
                                In a dependently typed programming language is Type-in-Type practical for programming?
                            
                                Mysterious crash inside CoreAnalytics::Client / FrameworkConfiguration::init

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combining nearby bounding boxes along one axis

Tags:

python

python-3.x

opencv

google-vision

Current approach

titipata

People also ask

1 Answers

Zabir Al Nazi

Recent Activity

Donate For Us