Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract relevant information from receipt

I am trying to extract information from a range of different receipts using a combination of Opencv, Tesseract and Keras. The end result of the project is that I should be able to take a picture of a receipt using a phone and from that picture get the store name, payment type (card or cash), amount paid and change tendered.

So far I have done a few different preprocessing steps on a series of different sample receipts using Opencv such as removing background, denoising and converting to a binary image and am left with an image such as the following:

receipt-scanned

I am then using Tesseract to perform ocr on the receipt and write the results out to a text file. I have managed to get the ocr to perform at an acceptable level, so I can currently take a picture of a receipt and run my program on it and I will get a text file containing all the text on the receipt.

My problem is that I don't want all of the text on the receipt, I just want certain information such as the parameters I listed above. I am unsure as to how to go about training a model that will extract the data I need.

Am I correct in thinking that I should use Keras to segment and classify different sections of the image, and then write to file the text in sections that my model has classified as containing relevant data? Or is there a better solution for what I need to do?

Sorry if this is a stupid question, this is my first Opencv/machine learning project and I'm pretty far out of my depth. Any constructive criticism would be much appreciated.

like image 395
R.E. Avatar asked Aug 21 '17 12:08

R.E.


1 Answers

My answer isn't as fancy as what's in fashion right now, but I think it works in your case, specially if this is for a product (not for research & publication purposes).

Example steps

I would implement the paper Text/Graphics Separation Revisited. I have already implemented it in both Matlab & C++ and I guarantee from your description it won't take you long. In summary:

  1. Get all connected components with stats. You're specially interested in the bounding box for each character.

  2. The paper obtains thresholds from histograms on the properties of your connected components, which makes it a bit robust. Using these thresholds (that work surprisingly well) on the geometrical properties of your connected components, discard anything that's not a character.

  3. For your characters, get the centroid for all of their bounding boxes and group the close centroids by your own criteria (height, vertical position, euclidean distance, etc.). Use the obtained centroid clusters to create rectangular text regions.

  4. Associate text regions of same height and vertical position.

  5. Run OCR on your text regions and look for keywords like "Cash". I honestly think you can get away with having dictionaries with text files, and from having done computer vision for mobile I know your resources are limited (by privacy too).

I honestly don't think a neural net will be much better than some kind of keyword matching (e.g. using Levenshtein distance or something similar to add a bit of robustness) because you will need to manually create and label these words anyway to create your training dataset, so... Why not just write them down instead?

That's basically it. You end up with something very fast (specially if you want to use a phone and you can't send images to a server) and it just works. No machine learning needed, so no dataset needed either.

But if this is for school... Sorry I was so rude. Please use TensorFlow with 10,000 manually labeled receipt images and natural language processing methods, your professor will be happy.

like image 66
DanyAlejandro Avatar answered Sep 29 '22 17:09

DanyAlejandro