Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any option on Google Cloud Vision API, to detect and return a table (Rows and Column with headers) from a scanned Image?

We are using Google Cloud Vision APIs to extract Invoice fields. We would like to know whether the APIs support detection of table of data? Or do we have to write custom code to detect tables?

like image 904
Ajinkya Mulay Avatar asked Dec 19 '25 22:12

Ajinkya Mulay


1 Answers

The Google Vision API will not return data from forms in a structured way. However, the coordinates of the polygons that surround the text (the boundingPoly) will be provided in the response. Take a look at this example:

{
     "description": "ABBEY",
     "boundingPoly": {
         "vertices": [ {
             "x": 44,
             "y": 43
             }, ...
          ] }, ...
}

One approach you can use is to determine the coordinates of the field on your invoice and then write some code to iterate through the boundingPoly objects of your JSON response to check if the region in which the vertices lie overlaps to some degree with the region of your fields. If the boundingPoly coordinates are in the same region as your fields, then - with Python for example - you can map those words using a dictionary to your field names.

like image 163
Christopher P Avatar answered Dec 21 '25 16:12

Christopher P



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!