AWS Textract not working with Invoices in PDFs. Any advice?

Question

There is a "Try" feature in AWS Textract page where we can upload Invoices in PDF, JPEG etc. But when I uploaded the PDF it wasn't working. Table's were not being shown, Form (Key-Pair values) were not being shown....nothing. But when I uploaded Invoice in JPEG it was working good. I didn't understand why.

I searched all over the internet but I couldn't find any solution. Some people even never heard of AWS Textract, even though I found its better than Google Document AI.

Please help!

I searched all over the internet but I couldn't find any solution. Some people even never heard of AWS Textract, even though I found its better than Google Document AI.

Please help!

Thomas · Accepted Answer

You can use the amazon-textract-textractor package to simplify calling and parsing Amazon Textract. Here is a link on a tutorial on how to use the AnalyzeExpense API. https://aws-samples.github.io/amazon-textract-textractor/notebooks/using_analyze_expense.html

If your pdf is a single-page, you can use the SYNC .analyze_expense API like this:

from textractor import Textractor

extractor = Textractor(profile_name="default")

document = extractor.analyze_expense(
    file_source="invoice.pdf",
    save_image=True,
)
document.visualize(with_words=False)

enter image description here

If your PDF document is multi-page, you need to use the ASYNC .start_expense_analysis API. You can do it like this:

from textractor import Textractor

extractor = Textractor(profile_name="default")

document = extractor.start_expense_analysis(
    file_source="./multipage_invoice.pdf",
    s3_upload_path="<YOUR S3 BUCKET>",
    s3_output_path="<YOUR S3 BUCKET>",
    save_image=True,
)
document.visualize(with_words=False)[0]

AWS Textract not working with Invoices in PDFs. Any advice?

Tags:

amazon-web-services

amazon-textract

Tushar

1 Answers

Thomas

Recent Activity

Donate For Us

AWS Textract not working with Invoices in PDFs. Any advice?

Tags:

amazon-web-services

amazon-textract

Tushar

1 Answers

Thomas

Related questions

Recent Activity

Donate For Us