Here, I use Google Vision API to detect text from the following image. The red box indicates samples of a combined bounding box that I would like to obtain.
Basically, I get the text output and bounding box from the above image. Here, I would like to merge the bounding boxes and texts that are located along the same row (left to right). For example, the first line will get merged together:
[{'description': 'บริษัทไปรษณีย์ไทย',
'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
{'description': 'จํากัด',
'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
...
to
[{'description': 'บริษัทไปรษณีย์ไทยจำกัด',
'vertices': [(528, 202), (809, 202), (809, 222), (528, 222)]},
...
These following rows
{'description': 'RP',
'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
{'description': '8147',
'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
{'description': '3609',
'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
{'description': '7',
'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]},
{'description': 'TH',
will get merged together.
I looked into - Solution using OpenCV - Non-max suppression algorithm
but cannot produce one specific for my need since it relies on percentage of overlapping pixels. If someone can help, that would be great!
Please try to use bounding box data here: https://gist.github.com/titipata/fd44572f7f6c3cc1dfbac05fb86f6081
If this distance is lower than a threshold you create a new bounding box that has the coordinates of left-top point with the lower values of x and y of the two boxes and the coordinates of the right-bottom point with the highest values of x and y of the two boxes.
Increase the size of bounding boxes, make them overlap and try to merge. change the valuse expansionAmount in your code. By the way you totally copied the exaple code from matlab. And only use either regionprops or stoke width variation to remove non text region , the code you copied using both.
input:
out = [{'description': 'บริษัทไปรษณีย์ไทย',
'vertices': [(528, 202), (741, 202), (741, 222), (528, 222)]},
{'description': 'จํากัด',
'vertices': [(754, 204), (809, 204), (809, 222), (754, 222)]},
{'description': 'RP',
'vertices': [(729, 1072), (758, 1072), (758, 1091), (729, 1091)]},
{'description': '8147',
'vertices': [(768, 1072), (822, 1072), (822, 1092), (768, 1092)]},
{'description': '3609',
'vertices': [(834, 1073), (889, 1073), (889, 1093), (834, 1093)]},
{'description': '7',
'vertices': [(900, 1073), (911, 1073), (911, 1092), (900, 1092)]}
]
I assumed, the 4 tuples represent x, y co-ordinates of the upper left, upper right, lower right and lower left co-ordinates respectively (in order).
First, we need to find all the bbox pairs which are close in the x direction, and almost same in the y direction (location at same height). N.B: You may need to tune the two thresholds if something is missed.
import numpy as np
pairs = []
threshold_y = 4 # height threshold
threshold_x = 20 # x threshold
for i in range(len(out)):
for j in range(i+1, len(out)):
left_upi, right_upi, right_lowi, left_lowi = out[i]['vertices']
left_upj, right_upj, right_lowj, left_lowj = out[j]['vertices']
# first of all, they should be in the same height range, starting Y axis should be almost same
# their starting x axis is close upto a threshold
cond1 = (abs(left_upi[1] - left_upj[1]) < threshold_y)
cond2 = (abs(right_upi[0] - left_upj[0]) < threshold_x)
cond3 = (abs(right_upj[0] - left_upi[0]) < threshold_x)
if cond1 and (cond2 or cond3):
pairs.append([i,j])
out:
pairs
[[0, 1], [2, 3], [3, 4], [4, 5]]
merged_pairs = []
for i in range(len(pairs)):
cur_set = set()
p = pairs[i]
done = False
for k in range(len(merged_pairs)):
if p[0] in merged_pairs[k]:
merged_pairs[k].append(p[1])
done = True
if p[1] in merged_pairs[k]:
merged_pairs[k].append(p[0])
done = True
if done:
continue
cur_set.add(p[0])
cur_set.add(p[1])
match_idx = []
while True:
num_match = 0
for j in range(i+1, len(pairs)):
p2 = pairs[j]
if j not in match_idx and (p2[0] in cur_set or p2[1] in cur_set):
cur_set.add(p2[0])
cur_set.add(p2[1])
num_match += 1
match_idx.append(j)
if num_match == 0:
break
merged_pairs.append(list(cur_set))
merged_pairs = [list(set(a)) for a in merged_pairs]
out:
merged_pairs
[[0, 1], [2, 3, 4, 5]]
Alternative networkx solution:
(much shorter if you don't mind using additional networkx
import)
import networkx as nx
g = nx.Graph()
g.add_edges_from([[0, 1], [2, 3], [3, 4], [4, 5]]) # pass pairs here
gs = [list(a) for a in list(nx.connected_components(g))] # get merged pairs here
print(gs)
[[0, 1], [2, 3, 4, 5]]
# for connected components, sort them according to x-axis and merge
out_final = []
INF = 999999999 # a large number greater than any co-ordinate
for idxs in merged_pairs:
c_bbox = []
for i in idxs:
c_bbox.append(out[i])
sorted_x = sorted(c_bbox, key = lambda x: x['vertices'][0][0])
new_sol = {}
new_sol['description'] = ''
new_sol['vertices'] = [[INF, INF], [-INF, INF], [-INF, -INF], [INF, -INF]]
for k in sorted_x:
new_sol['description'] += k['description']
new_sol['vertices'][0][0] = min(new_sol['vertices'][0][0], k['vertices'][0][0])
new_sol['vertices'][0][1] = min(new_sol['vertices'][0][1], k['vertices'][0][1])
new_sol['vertices'][1][0] = max(new_sol['vertices'][1][0], k['vertices'][1][0])
new_sol['vertices'][1][1] = min(new_sol['vertices'][1][1], k['vertices'][1][1])
new_sol['vertices'][2][0] = max(new_sol['vertices'][2][0], k['vertices'][2][0])
new_sol['vertices'][2][1] = max(new_sol['vertices'][2][1], k['vertices'][2][1])
new_sol['vertices'][3][0] = min(new_sol['vertices'][3][0], k['vertices'][3][0])
new_sol['vertices'][3][1] = max(new_sol['vertices'][3][1], k['vertices'][3][1])
out_final.append(new_sol)
final_out:
out_final
[{'description': 'บริษัทไปรษณีย์ไทยจํากัด',
'vertices': [[528, 202], [809, 202], [809, 222], [528, 222]]},
{'description': 'RP814736097',
'vertices': [[729, 1072], [911, 1072], [911, 1093], [729, 1093]]}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With