I am using amazon textract to analyse pdf documents using the async APIs of amazon textract. After I perform the operations, in some cases the output Textract JSON is missing a few pages. What is the reason for missing a few files?
Ex: In this document, it has 4 pages.

But the extraction information is only available for 2 pages.

This is the document information

It's the NextToken . When NextToken is populated you need to make another call to get the next segment of results. When NextToken is null, you have all the results.
I'm using CLI but
aws textract get-document-analysis --next-token FLpA6... --job-id 12345....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With