Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Payable Invoice Capturing OR extracting automation [closed]

I am creating a desktop/winform application that reads tif/pdf payable invoices and extract all the invoice information to store into database.

I can read the standard barcodes(QR Code, Code39 etc), and some of the payable invoice' standard fields(Invoice Date, Company Name, Address) with OCR (ocr specific region of image) but unable to capture Line items, amount correctly.

I extract information in two phases:
1. Read specific regions based on the template(user mapped region for specific fields)
2. OCR whole page and search for payable invoice standard field names and values

I have idea about following 3 approaches:
1. Create a Template for one type of Invoice and process all invoices.
2. Nural network based engine which need to be trained with sample data to work it based on patterns.
3. Form processing, a kind of OMR. The OCR to look at exact same coordinates where fields were placed on form(during form desing)

Question:
How to extact payable invoice using OCR or some inteligent reader?
Primarily I look for some algorithem (C# + OCR engine)/ philoshpy of payable invoice capturing but reference to some SDK with same feature or solid kind of commercial product would be helpfull too.

I googled and found Abbyy FlexiCapture Engine, IRIS Capture & Extract somewhat promissing but mostly are based on templates, or training. They claim that no template or training required but nothing looks 100 auto capture.

Kindly refere some product (at least with free trial), SDK or Example/sample.

like image 385
Munawar Avatar asked Nov 16 '13 17:11

Munawar


People also ask

Can invoice processing be automated?

Automated invoice processing is the use of software to process invoices for accounts payable and update the information in your Enterprise Resource Planning (ERP) system. It helps you speed up invoice approvals, minimize errors, and reduce processing costs. Paying an invoice isn't as simple as writing a check.

What are the biggest problems in invoice processing?

Inaccurate data, missing invoices, staff absences and information trapped on paper all cause bottlenecks for finance teams and slow the Accounts Payable process down. Longer processing times can mean late payments, fees and overworked staff.

How do you capture information for an invoice transaction?

Invoice data capture methods. There are three ways of collecting information from invoices: manual data entry, template-based OCR solutions, and smart automated OCR solutions. All have their place, though advances in smart technologies and the evolution of AP best practices are rendering some methods obsolete.


1 Answers

Of course, by 2018 the situation improved a bit. Let me recapitulate the main approaches today:

  • Still a raw OCR engine (tesseract, Abbyy, Google OCR etc.) and regexes (this may still work just fine for some very limited use-cases)
  • Abbyy FlexiCapture Engine - still going strong, but still based on templates, if you are willing to define one new template for each specific invoice format
  • Rossum Elis (invoices), TagGun (receipts), ... - APIs based on pre-trained machine learning models, i.e. usable and working immediately, with free monthly volumes
  • LucidTech, Itemize, ... - less accessible APIs with a similar functionality (you need to go through a demo and sales process)
  • Datamolino, CloudFactory, ... - APIs with humans behind the scenes performing the data transcription manually (different latency, pricing and accuracy structure)
like image 140
Petr Baudis Avatar answered Sep 18 '22 16:09

Petr Baudis