I am using Google Vision API, primarily to extract texts. I works fine, but for specific cases where I would need the API to scan the enter line, spits out the text before moving to the next line. However, it appears that the API is using some kind of logic that makes it scan top to bottom on the left side and moving to right side and doing a top to bottom scan. I would have liked if the API read left-to-right, move down and so on. For example, consider the image: <img src="https://i.stack.imgur.com/wk3t1.png" alt="enter image description here"> The API returns the text like this: <pre class="prettyprint"><code>“ Name DOB Gender: Lives In John Doe 01-Jan-1970 LA ” </code></pre> Whereas, I would have expected something like this: <pre class="prettyprint"><code>“ Name: John Doe DOB: 01-Jan-1970 Gender: M Lives In: LA ” </code></pre> I suppose there is a way to define the block size or margin setting (?) to read the image/scan line by line? Thanks for your help. Alex

You can extract the text based on the bounds per line too, you can use boundyPoly and concatenate the text in the same line <pre class="prettyprint"><code>"boundingPoly": { "vertices": [ { "x": 87, "y": 148 }, { "x": 411, "y": 148 }, { "x": 411, "y": 206 }, { "x": 87, "y": 206 } ] </code></pre> for example this 2 words are in the same "line" <pre class="prettyprint"><code>"description": "you", "boundingPoly": { "vertices": [ { "x": 362, "y": 1406 }, { "x": 433, "y": 1406 }, { "x": 433, "y": 1448 }, { "x": 362, "y": 1448 } ] } }, { "description": "start", "boundingPoly": { "vertices": [ { "x": 446, "y": 1406 }, { "x": 540, "y": 1406 }, { "x": 540, "y": 1448 }, { "x": 446, "y": 1448 } ] } } </code></pre>

Text extraction - line-by-line

Tags:

google-vision

google-cloud-vision

I am using Google Vision API, primarily to extract texts. I works fine, but for specific cases where I would need the API to scan the enter line, spits out the text before moving to the next line. However, it appears that the API is using some kind of logic that makes it scan top to bottom on the left side and moving to right side and doing a top to bottom scan. I would have liked if the API read left-to-right, move down and so on.

For example, consider the image:

enter image description here

The API returns the text like this:

Click to copy

“ Name DOB Gender: Lives In John Doe 01-Jan-1970 LA ”

Whereas, I would have expected something like this:

Click to copy

“ Name: John Doe DOB: 01-Jan-1970 Gender: M Lives In: LA ”

I suppose there is a way to define the block size or margin setting (?) to read the image/scan line by line?

Thanks for your help. Alex

699

asked Feb 22 '17 12:02

Alagappan Narayanan

3 Answers

This might be a late answer but adding it for future reference. You can add feature hints to your JSON request to get the desired results.

Click to copy

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://i.stack.imgur.com/TRTXo.png"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

For text which are very far apart the DOCUMENT_TEXT_DETECTION also does not provide proper line segmentation.

The following code does simple line segmentation based on the character polygon coordinates.

enter image description here

https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision

answered Oct 17 '22 01:10

Nirojan Selvanathan

Here a simple code to read line by line. y-axis for lines and x-axis for each word in the line.

Click to copy

items = []
lines = {}

for text in response.text_annotations[1:]:
    top_x_axis = text.bounding_poly.vertices[0].x
    top_y_axis = text.bounding_poly.vertices[0].y
    bottom_y_axis = text.bounding_poly.vertices[3].y

    if top_y_axis not in lines:
        lines[top_y_axis] = [(top_y_axis, bottom_y_axis), []]

    for s_top_y_axis, s_item in lines.items():
        if top_y_axis < s_item[0][1]:
            lines[s_top_y_axis][1].append((top_x_axis, text.description))
            break

for _, item in lines.items():
    if item[1]:
        words = sorted(item[1], key=lambda t: t[0])
        items.append((item[0], ' '.join([word for _, word in words]), words))

print(items)

answered Oct 17 '22 00:10

Gino

You can extract the text based on the bounds per line too, you can use boundyPoly and concatenate the text in the same line

Click to copy

"boundingPoly": {
        "vertices": [
          {
            "x": 87,
            "y": 148
          },
          {
            "x": 411,
            "y": 148
          },
          {
            "x": 411,
            "y": 206
          },
          {
            "x": 87,
            "y": 206
          }
        ]

for example this 2 words are in the same "line"

Click to copy

"description": "you",
      "boundingPoly": {
        "vertices": [
          {
            "x": 362,
            "y": 1406
          },
          {
            "x": 433,
            "y": 1406
          },
          {
            "x": 433,
            "y": 1448
          },
          {
            "x": 362,
            "y": 1448
          }
        ]
      }
    },
    {
      "description": "start",
      "boundingPoly": {
        "vertices": [
          {
            "x": 446,
            "y": 1406
          },
          {
            "x": 540,
            "y": 1406
          },
          {
            "x": 540,
            "y": 1448
          },
          {
            "x": 446,
            "y": 1448
          }
        ]
      }
    }

answered Oct 17 '22 00:10

Javier

Related questions
                            
                                OCR confidence score from Google Vision API
                            
                                Google Cloud Vision - Which region does Google upload the images to?
                            
                                Format OCR text annotation from Cloud Vision API in Python
                            
                                How to pass an api key to the Google Cloud Vision NodeJS API
                            
                                How do I call the Google Vision API with an image stored in Google Cloud Storage?
                            
                                Does Google Cloud Vision API support face recognition or face identification?
                            
                                Google Cloud Vision API 'Request Admission Denied'
                            
                                Can't import google.cloud.vision
                            
                                How to enable Google Vision API to access Google Cloud Storage Bucket within same project
                            
                                Google Cloud Vision API "PERMISSION_DENIED"
                            
                                Does google-cloud-vision stores uploaded images ? what is privacy policy for that?
                            
                                google-cloud-vision how to read pdf file
                            
                                Google Cloud vision API: "Request had insufficient authentication scopes."
                            
                                React Native component for Google Cloud Vision API - Text Detection [closed]
                            
                                Google vision Text Detection response to be line by line
                            
                                Is there a way to see estimated time for training a model in Google AutoML Vision?
                            
                                Cloud Vision API Client threw an OS Error "too many open files"
                            
                                vision-client doesn't supprt api-key
                            
                                AggregateException when calling GetApplicationDefaultAsync()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Text extraction - line-by-line

Tags:

google-vision

google-cloud-vision

Alagappan Narayanan

People also ask

3 Answers

Nirojan Selvanathan

Gino

Javier

Recent Activity

Donate For Us