Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invoice automatic data extraction OCR or PDF [closed]

Tags:

ocr

invoice

I am looking for a solution to extract data from my invoices to send a summary to my accountant.

There are some companies out there which provide such services for around 20€ a month and invoices are usually very well recognised. But the services I tried don't extract all data I like, or are missing some functionality like an excel export to send the data to my accountant. And paying 20€ a month and having to manage another service for 5 invoices per month didn't appeal to me yet.

I was researching a little bit and found this stackoverflow question: Can anyone recommend OCR software to process invoices?

It's a bit outdated and hope to find some more up to date recommendations. I tried the Ephesoft community edition and it looked very promising at first. But the software has a learning and a review step. Inside the review step the data doesn't seem to be fed back to the learning step. Plus it feels more cumbersome then just doing it by hand. I assume it's made for big businesses.

I am looking for a simple data extraction software, which learns with each step I show it.

I also had a look at Apache Tika, but it doesn't seem ready to use with a simple web-interface.

  1. Do you have some recommendation for payed OCR services? Flexible to extract Total VAT amount/VAT %/ Total Amount/ Total Amount Currency/ VAT Currency/ Which account it was payed with/ Company name. With an export to excel?

  2. Do you have some recommendations for open source software?

  3. Do you have some general advice of how you handle your few (less than 50 a year) invoices?

like image 879
Toby Avatar asked Jul 01 '17 10:07

Toby


People also ask

What is OCR invoice?

Invoice OCR is the process of extracting data from digital documents such as invoices and other accounting documents, and converting them into searchable and editable text.

What is invoice data extraction?

It is the process of extracting relevant data such as invoice number, supplier name, address, amount etc., from invoices, validating the extracted information, uploading it to an ERP software, ascertaining match (against receipts & POs) and finally initiating payments.

Which technology is used to extract data from scanned invoice?

Digital copies of invoices are obtained by scanning invoices or taking pictures using a camera. The text is extracted from these invoices using OCR. This is able to provide digital text that makes data entry a little easier. But a lot of work still needs to be done manually.

How do you capture information for an invoice transaction?

Invoice data capture methods. There are three ways of collecting information from invoices: manual data entry, template-based OCR solutions, and smart automated OCR solutions. All have their place, though advances in smart technologies and the evolution of AP best practices are rendering some methods obsolete.


1 Answers

Except raw OCR and regexes on top of that (which may work fine for some very limited use-cases), there are several other options which offer API access. Those you can actually start using without any demo or sales process:

  • TagGun - specialized on receipts, can extract line-items too, free for 50 receipts monthly
  • Elis - specialized on invoices, supports a wide variety of templates automatically (a pre-trained machine learning model), free for under 300 invoices monthly

If you are willing to go through the sales process (and they actually seem to be real and live):

  • LucidTech and Itemize (not sure what their accuracy is and what are the fields they extract, as their API details are non-public)
  • FlexiCapture Engine - based on templates, if you are willing to define one for each specific invoice format

(disclaimer: I'm affiliated with Rossum, the vendor of Elis. Feel free to suggest edits adding other APIs!)

like image 188
Petr Baudis Avatar answered Oct 18 '22 10:10

Petr Baudis