We are using Google Cloud Vision APIs to extract Invoice fields. We would like to know whether the APIs support detection of table of data? Or do we have to write custom code to detect tables?
The Google Vision API will not return data from forms in a structured way. However, the coordinates of the polygons that surround the text (the boundingPoly) will be provided in the response. Take a look at this example:
{
"description": "ABBEY",
"boundingPoly": {
"vertices": [ {
"x": 44,
"y": 43
}, ...
] }, ...
}
One approach you can use is to determine the coordinates of the field on your invoice and then write some code to iterate through the boundingPoly objects of your JSON response to check if the region in which the vertices lie overlaps to some degree with the region of your fields. If the boundingPoly coordinates are in the same region as your fields, then - with Python for example - you can map those words using a dictionary to your field names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With