Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OCR for extracting text from cedula/passport C#

I'm looking for a Tesseract or Google's Vision API type of OCR which can help in extracting textual information in passport / ID card image, (which may be captured from mobile or may be scanned. Hence frame size may vary a little). I have been through several posts, and and found Tesseract as preferred solution.

I also tested my test data using Vision API, and got 99% accurate and satisfactory results. But I have following problems/requirements:

Problems:

  • Tesseract is the suggested solution that i found on most of the posts i had been through, but it gives very bad results, as frame may vary. I can't train data, and I'm okay with any paid library available to help me in my scenario.
  • Vision API gives accurate result, but my requirement is to not to use cloud based solution.
  • There are few providers, (eg, LeadTool, IdScan etc.) which provide this feature, but they use their scanners first to scan the passport. Hence their SDK works for their scanners device.

Summary: Is there any available (paid or opensource) c# library available, which takes passport/cedula image as input, and returns accurate text?. Any suggestion/help will be appreciated.

like image 910
Zeeshan Avatar asked Aug 17 '16 05:08

Zeeshan


2 Answers

Company called MicroBlink created BlinkID SDK to scan passports, ID cards. It is not free for commercial usage, but free for development. Link to SDK's site HERE. Tesseract OCR tool may give you false results because you probably have not done any processing for an image before OCR scan, which is mandatory if you want a proper result, especially for images of passports and ID's and so on. For image processing you can use OpenCV (free), but it may take you time to learn computer vision and image processing (which are very rewarding actually).

like image 61
Dainius Šaltenis Avatar answered Oct 06 '22 14:10

Dainius Šaltenis


I'm one of the developers in MicroBlink, which is a company specializing in development of barcode and OCR solutions.

Tesseract is indeed one of the options you have. The problem with Tesseract is that it's hard to set the right parameters to get really accurate OCR results. And you still need to implement the data extraction logic on top of the OCR results. And integration on iOS/Android requires two separate codebases.

Google Cloud Vision gives very accurate OCR result, but as you said, it performs image processing on server side, which raises privacy and security concerns regarding sending private ID information over the network to third parties.

There are other companies developing similar products with similar properties (server side, no data extraction, etc..)

MicroBlink's BlinkID is different in the sense that it performs all processing locally (without server side connection). It uses our proprietary machine-learning based OCR engine to ensure data is captured correctly. It supports MRZ, PDF417 barcodes, and scanning the front side of some ID documents (such as UK Driver's licenses, Malaysian IDs, EU IDs...). All ID data is parsed and verified according to country's standards with checksum validation.

BlinkID is provided as a native iOS, Android and Windows Phone 8 SDKs, Phonegap / Cordova plugins for iOS and Android, and Xamarin component (C#) for iOS and Android.

There are also server side library (available on request) which can run on Linux / Windows / MacOS and which has C API and that can be used from .NET application using C++/CLI. Our development team is here to help with the integration in a .NET app.

Please contact [email protected] for more information on the subject.

like image 3
Cerovec Avatar answered Oct 06 '22 15:10

Cerovec