Amazon Textract missing text in particular location

Question

I'm running some documents through Textract and there's one particular case where it fails to read some of the text, i.e., when the main text is oriented in one direction, and the small tidbit that I need is oriented in another. I've attached an image showing an example. enter image description here

For reference, the "Picks this up" says "Page: 1 of 2". Is there a workaround for this? It's a rare edge case, so I'm fine with an inefficient solution

Thomas · Accepted Answer

As of 2023-03-02, Textract does not support multi-orientation text in a document. Textract will do a best bet at detecting the orientation of the document and then detect the text in that orientation.

However there are work-arounds:

For example you could call textract in a first orientation, then white out the text that is returned by textract and send it again. It will rotate the document accordingly for the remaining words that are oriented differently.
You can also call the Amazon Rekognition DetecText API. This recognizes words in any orientation, however it has a limit of 100 words.

Here is how to use the Amazon Rekognition API:

    session = boto3.Session(profile_name='default')
    client = session.client('rekognition')

    response = client.detect_text(Image={'S3Object': {'Bucket': bucket, 'Name': document_name}})

    textDetections = response['TextDetections']
    for text in textDetections:
        print('Detected text:' + text['DetectedText'])
        print('Confidence: ' + "{:.2f}".format(text['Confidence']) + "%")
        print('Type:' + text['Type'])
        print()

Amazon Textract missing text in particular location

Tags:

text-extraction

ocr

amazon-textract

ryanjackson

1 Answers

Thomas

Recent Activity

Donate For Us

Amazon Textract missing text in particular location

Tags:

text-extraction

ocr

amazon-textract

ryanjackson

1 Answers

Thomas

Related questions

Recent Activity

Donate For Us