I am using Google Vision API, primarily to extract texts. I works fine, but for specific cases where I would need the API to scan the enter line, spits out the text before moving to the next line. However, it appears that the API is using some kind of logic that makes it scan top to bottom on the left side and moving to right side and doing a top to bottom scan. I would have liked if the API read left-to-right, move down and so on.
For example, consider the image:
The API returns the text like this:
“ Name DOB Gender: Lives In John Doe 01-Jan-1970 LA ”
Whereas, I would have expected something like this:
“ Name: John Doe DOB: 01-Jan-1970 Gender: M Lives In: LA ”
I suppose there is a way to define the block size or margin setting (?) to read the image/scan line by line?
Thanks for your help. Alex
Text extractors use AI to identify and extract relevant or notable pieces of information from within documents or online resources. Most simply, text extraction pulls important words from written texts and images.
Extracted Text means collecting text from files that include text in their original native file format or after generating text files from Optical Character Recognition (OCR) software for the purpose of indexing the text into an application for search and retrieval purposes.
OCR (Optical Character Recognition) is an electronic computer-based approach to convert images of text into machine-encoded text, which can then be extracted and used in text format.
This might be a late answer but adding it for future reference. You can add feature hints to your JSON request to get the desired results.
{
"requests": [
{
"image": {
"source": {
"imageUri": "https://i.stack.imgur.com/TRTXo.png"
}
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
]
}
For text which are very far apart the DOCUMENT_TEXT_DETECTION also does not provide proper line segmentation.
The following code does simple line segmentation based on the character polygon coordinates.
https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision
Here a simple code to read line by line. y-axis for lines and x-axis for each word in the line.
items = []
lines = {}
for text in response.text_annotations[1:]:
top_x_axis = text.bounding_poly.vertices[0].x
top_y_axis = text.bounding_poly.vertices[0].y
bottom_y_axis = text.bounding_poly.vertices[3].y
if top_y_axis not in lines:
lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]
for s_top_y_axis, s_item in lines.items():
if top_y_axis < s_item[0][1]:
lines[s_top_y_axis][1].append((top_x_axis, text.description))
break
for _, item in lines.items():
if item[1]:
words = sorted(item[1], key=lambda t: t[0])
items.append((item[0], ' '.join([word for _, word in words]), words))
print(items)
You can extract the text based on the bounds per line too, you can use boundyPoly and concatenate the text in the same line
"boundingPoly": {
"vertices": [
{
"x": 87,
"y": 148
},
{
"x": 411,
"y": 148
},
{
"x": 411,
"y": 206
},
{
"x": 87,
"y": 206
}
]
for example this 2 words are in the same "line"
"description": "you",
"boundingPoly": {
"vertices": [
{
"x": 362,
"y": 1406
},
{
"x": 433,
"y": 1406
},
{
"x": 433,
"y": 1448
},
{
"x": 362,
"y": 1448
}
]
}
},
{
"description": "start",
"boundingPoly": {
"vertices": [
{
"x": 446,
"y": 1406
},
{
"x": 540,
"y": 1406
},
{
"x": 540,
"y": 1448
},
{
"x": 446,
"y": 1448
}
]
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With