I would like to develop an app which should be able to recognise some numbers in a computer printed card (located in fixed locations of the card) and then send them to a webservice.
I know that I should use an OCR but I'm not sure which product would fit my needs. It would be great if you could suggest me any api's or products in the market (opensource is not a must but it will be very welcome :) that could help me in this project.
Besides that I have another technical question: would you implement the OCR recognision in the device or you'd do it using a webservice and call it passing the picture to it? Which are the pros and cons of both models?
Scan text directly from the iPhone camera You can also scan text directly from the iPhone's Camera app. Point the camera at words and if the software recognizes text, it will display the three-lines-of-text icon in the bottom right of the viewfinder, or top-right in landscape. Press the button.
If you need solution that locates specific fields on an image, then it is not just OCR, but a Data Capture task. There are several approaches how to solve it: write your of field detection solution based on OCR output like was suggested in other answer, or use toolkit that is specially designed for that and offers visual tools for defining layout structure.
First way requires more programming but is cheaper in terms of licensing. You can choose not only commertial but also open source OCR libraries like Tesseract, which maynot perfect but with some tweaking and font training can by good enough for many tasks.
When dealing with low quality images (and images taken by phone camera will have significant portion of those) your field location solution will have to take care about cases when some parts of images were not recognized or wrongly recognized and still be able to locate fields you want. You may also want to cross-check several recognition variants to provide reasonable combinations.
This is not trivial and will require some time to get it work reliable. But still doable, provided you have not very complicated documents and there is just one layout and it is very predictable. And once you own the code, this can be run both on the server and the phone.
If you are looking for little bit more complex documents and variety of layout variants, mantaing this logic in pure code can become too difficult. In this case it is better to look for more advanced Data Capture technologies. There is quite a number of Data Captrue products out there, but I know just one that is offered in the form of API: http://www.abbyy.com/flexicapture_engine/
It has two components. One is visual tool to create and debug document description. You just describe logic of the field location on the document, and technology takes care about the rest: voting about different variants, taking care about mistakes in recignition and so on. You can define several alternative document structures and rules to check if one value do correspont to another in the document layout. Those rules will also influence selecting best recognition variants.
Second component is actually API. You just plug it into your application and load document template description. In mobile recognition scenario it can only be used as server back-end processing, since it is too powerful and heavy to fit into mobile. However, the bright side of that is that you don't have to port it to every mobile OS, it uses full-funcitonal OCR technology as opposed to restricted ones that fit to mobile resource. This toolkit does include some advanced image processing technologies that make it work better on images captred by the phone.
Disclaimer: I work for ABBYY.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With