Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Textract JSON missing some pages

I am using amazon textract to analyse pdf documents using the async APIs of amazon textract. After I perform the operations, in some cases the output Textract JSON is missing a few pages. What is the reason for missing a few files?

Ex: In this document, it has 4 pages.

enter image description here

But the extraction information is only available for 2 pages.

enter image description here

This is the document information enter image description here

like image 214
gokublack Avatar asked Sep 15 '25 15:09

gokublack


1 Answers

It's the NextToken . When NextToken is populated you need to make another call to get the next segment of results. When NextToken is null, you have all the results.

I'm using CLI but

aws textract get-document-analysis --next-token FLpA6... --job-id 12345....

like image 180
altintx Avatar answered Sep 18 '25 09:09

altintx