Amazon Textract returns different results returns different results between WebApp Demo, AnalyzeDocumentRequest and StartDocumentAnalysisRequest

Question

this is my first question on StackOverFlow, I would like to extract key-value pairs (FORMS) from a (scanned) PDf document via Amazon Textract. What I have noticed, however, is that some key-value pairs returned by the webapp demo (https://us-east-2.console.aws.amazon.com/textract/home?region=us-east-2#/demo) are absent from the methods that can be implemented in the code.

Furthermore, between these two methods, the Synchronous method (AnalyzeDocumentRequest), which does not accept PDF but forces a pre-conversion of the document into an image, in turn finds key-value pairs (Sync Result Example) which the Asynchronous method does not. (Async Result Example)

The problem is similar to this guy's, when he talks about the difference in results between the two methods of analyzing the document : AWS Textract - GetDocumentAnalysisRequest only returns correct results for first page of document

The code implementation is equal to these example:

Synchronous Method: https://docs.aws.amazon.com/textract/latest/dg/examples-extract-kvp.html
Asynchronous Method: https://github.com/awsdocs/amazon-textract-developer-guide/blob/master/doc_source/async-analyzing-with-sqs.md

Has anyone ever had the same problem?

Jonathan Smith · Accepted Answer

We had this problem recently. The demo website provided by AWS found 50 fields, our own code using the provided API yielded 30 fields.

After some trial land error and a lot of googling we found that the response returned by GetDocumentAnalysisAsync included a NextToken which is used to ask for more results. Turns out we had to call GetDocumentAnalysisAsync again with this token (rinse-and-repeat) until the result response no longer included a NextToken.

At that point we knew we had all the data.

Amazon Textract returns different results returns different results between WebApp Demo, AnalyzeDocumentRequest and StartDocumentAnalysisRequest

Tags:

amazon-textract

the_nibble

1 Answers

Jonathan Smith

Recent Activity

Donate For Us

Amazon Textract returns different results returns different results between WebApp Demo, AnalyzeDocumentRequest and StartDocumentAnalysisRequest

Tags:

amazon-textract

the_nibble

1 Answers

Jonathan Smith

Related questions

Recent Activity

Donate For Us