Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use OpenCV+Tesseract for accurate Text recognition in Android?

I am trying to use OpenCV (Android) for processing image taken using camera and then pass it to Tesseract for text (digits) recognition but am not getting good results till the images are very (almost no noise) fine. Currently I am performing below processing on taken images as: 1. Applying Gaussian blur. 2. Adaptive threshold: to binarize the image. 3. Inverting colours to make background black. Then passing the processed image to Tesseract.

But I am not getting good results.

Please suggest what steps/measures I may take further to process image before passing to Tesseract or at stage while processing at Tesseract.

Also, are there any other better libraries in Android for this?

like image 471
arorak Avatar asked Apr 29 '14 10:04

arorak


People also ask

How accurate is Tesseract OCR?

The following results are presented for Tesseract: the original set of samples achieves a precision of 0.907 and 0.901 recall rate, while the preprocessed set leads to a precision of 0.929 and a recall of 0.928.

Why is the Tesseract OCR not accurate?

Inevitably, noise in an input image, non-standard fonts that Tesseract wasn't trained on, or less than ideal image quality will cause Tesseract to make a mistake and incorrectly OCR a piece of text.

Can OpenCV detect text?

OpenCV package is used to read an image and perform certain image processing techniques. Python-tesseract is a wrapper for Google's Tesseract-OCR Engine which is used to recognize text from images.


1 Answers

You can isolate/detect characters in images. This can be done with powerful algorithms such as the Stroke Width Transform.

The following steps worked well with me:

  1. Obtain grayscale of image.
  2. Perform canny edge detection on grayscale image.
  3. Apply gaussian blur on grayscale image(store in seperate matrix)
  4. Input matrices from steps 2 & 3 into SWT algorithm
  5. Binarize(threshhold) resulting image.
  6. Feed image to tesseract.

Please note, for step 4 you will need to build the c++ library in the link and then import into your android project with JNI wrappers. Also, you will need to do micro tweaking for all steps to get the best results. But, this should at least get you started.

like image 180
AmmarCSE Avatar answered Oct 03 '22 00:10

AmmarCSE