Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scan and read a document with tick boxes

Tags:

ocr

I have a request from a customer who wishes to provide meals to elderly people in different localities. To do this the people fill out a form for the week and tick boxes depending on their choices for each day (it also takes into account specific requirements).

For example :

 Name
 Commune

                  With salt ( )      Without salt []

Mon :       Meal 1 ( )                   Meal 2 ( )
           Dessert 1 ( )                 Dessert ( )

Tues :       Meal 1 ( )                   Meal 2 ( )
           Dessert 1 ( )                 Dessert ( )

The data from each sheet should then be compiled to tell us how many of each type of meal to prepare each day for each commune...

The sheets are all the same, so I am hoping to be able to scan them in and automatically read them.

I do not know of any software that allows me to do this. What is the best way of accomplishing this task? At the moment I am looking at tesseract, but maybe there is some simpler technique?

EDIT: we are talking about several hundred forms a week. ideally we will scan them at the same time, extract the data and store the forms electronically.

like image 768
Tom Macdonald Avatar asked May 15 '13 08:05

Tom Macdonald


1 Answers

You are looking not for OCR, which implies reading machine-printed characters. You are looking for ICR/OMR software, which is also known as form processing or data capture. OMR stands for Optical Mark Recognition, which is what you are trying to do, recognize value of checkmarks/checkboxes.

Additional info about handwriting recognition is here: ICR for machine printed text?

Because your forms are the same, that means your forms fall into category of "fixed forms" and a template-based software package can process those forms. Here is a short document explaining differences between form types: www.wisetrend.com/files/Structured_vs_Semi-Structured.pdf

Your blank form itself should also be designed properly for machine recognition. It should have reference marks for better alignment of template, clear flow so users know how to fill it out naturally, check boxes of appropriate size, etc.

I believe FlexiCapture will do everything you need: link. There are at least several other solutions that can perform the similar process. I work as an integrator/consultant for paper-based form-processing projects.

I removed your "mobile" tag, as I believe you are not planning to use a cell phone to capture these images. If you are, I would advise against that if you have other options. You mentioned scanning them on a conventional scanner, which is the best option to achieve good image quality. Trust me, you will have enough to deal with when processing human handwritten forms, so optimize your forms, scanning, software and process as much as possible.

If you are interested to develop it yourself, it is possible. The process is to compare an image area (each checkmark) with some 'baseline' to see if there is additional hand-writing for that area. If over some threshold, then the checkmark has been checked. Typical issues are alignment of areas and borderline threshold levels (small/light tick mark). Commercial packages handle that automatically.

Please let me know if you need any additional guidance.

ilya evdokimov

like image 160
Ilya Evdokimov Avatar answered Sep 29 '22 08:09

Ilya Evdokimov