Google vision Text Detection response to be line by line

Question

I am using the Google vision api to perform text recognition on receipt images. I am getting some nice results returned but the format in which the return is quite unreliable. If there is a large gap between text the readout will print the line below instead of the line next to it.

For example, with the following Recipt Image i get the below response:

    4x Löwenbräu Original a 3,00 12,00 1
    8x Weissbier dunkel a 3,30 26,401
    3x Hefe-Weissbier a 3,30 9,90 1
    1x Saft 0,25
    1x Grosses Wasser
    1x Vegetarische Varia
    1x Gyros
    1x Baby Kalamari Gefu
    2x Gyros Folie
    1x Schafskäse Ofen
    1x Bifteki Metaxa
    1x Schweinefilet Meta
    1x St ifado
    1x Tee
    2,50 1
    2,40 1
    9,90 1
    8,90 1
    12,90
    a 9,9019,80 1
    6,90 1
    11,90 1
    13,90 1
    14,90 1
    2,10 1

Which starts of well and as expected but then becomes fairly un helpful when trying to connect prices to text etc. The ideal response would be as follows:

    4x Löwenbräu Original a 3,00 12,00 1
    8x Weissbier dunkel    a 3,30 26,401
    3x Hefe-Weissbier      a 3,30 9,90 1
    1x Saft 0,25                  2,50 1
    1x Grosses Wasser             2,40 1
    1x Vegetarische Varia         9,90 1
    1x Gyros                      8,90 1
    1x Baby Kalamari Gefu        12,90 1
    2x Gyros Folie         a 9,9019,80 1
    1x Schafskäse Ofen            6,90 1
    1x Bifteki Metaxa            11,90 1
    1x Schweinefilet Meta        13,90 1
    1x St ifado                  14,90 1
    1x Tee                        2,10 1

Or close to that.

Is there a formatting request you can add to the api to get different responses? I have had success when using tessereact where you can change the output format to achieve this result and was wondering if the vision api has something similar.

I understand the api returns letter coordinates which could be used but i was hoping not to have to go into that kind of depth.

Nirojan Selvanathan · Accepted Answer

This might be a late answer but adding it for future reference. For text which are very far apart the DOCUMENT_TEXT_DETECTION also does not provide proper line segmentation.

The following code does simple line segmentation based on the character polygon coordinates.

https://github.com/sshniro/line-segmentation-algorithm-to-gcp-vision

Code Different · Answer

You can add feature hints to your JSON request. For image of a receipt like this, DOCUMENT_TEXT_DETECTION give good results:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://i.stack.imgur.com/TRTXo.png"
        }
      },
      "features": [
        {
          "type": "DOCUMENT_TEXT_DETECTION"
        }
      ]
    }
  ]
}

You can copy the above JSON and paste it into Request Body in the Try This API pane on the documentation page. Result:

4x LOwenbräu Original a 3,00 12,00 1
8x Weissbier dunkel a 3, 3026, 40 1
3x Hefe-Weissbier a 3,30990 1
1x Saft 0,25 2, 50 1
1x Grosses Wasser 2, 40 1
1x Vegetarische Varia 9,90 1
1x Gyros 8,90 1
1x Baby Kalamari Gefu 12,90 !
2x Gyros Folie a 9,9019, 80 1
1x Schaf skäse Ofen 6,90 1
1x Bifteki Metaxa 11,90 1
1x Schweinefilet Meta 13,90 1
1x Stifado 14, 90 1
1x Tee 2, 10 1

Googie Vision is much less configurable than Tesseract at the moment. Because Google is behind both projects, guess which one gonna get higher priority in the future?

Google vision Text Detection response to be line by line

Tags:

swift

google-cloud-vision

Wrumble

2 Answers

Nirojan Selvanathan

Code Different

Recent Activity

Donate For Us

Google vision Text Detection response to be line by line

Tags:

swift

google-cloud-vision

Wrumble

2 Answers

Nirojan Selvanathan

Code Different

Related questions

Recent Activity

Donate For Us