Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve pre-processing steps in Tesseract OCR for realtime capture

Tags:

ios

ocr

tesseract

I am working on reading identity card information using the Tesseract library. I have tried using some Google images and got good results, but when I went to real time images, that is when images are captured from an iPhone camera, I did not get good results.

I found some pre-processing steps suggested by Tesseract.

1. Fix DPI (if needed) 300 DPI is minimum.

How can I set the DPI of the image when capturing image from iPhone camera in real time?

2. Fix text size (e.g. 12 pt should be okay).

How do I fix the text size for the large image created by the iPhone camera?

3. Try to fix text lines (deskew and dewarp text).

I read that the Tesseract applies dewarp text using Leptonica library.Is dewarp or deskew needed for text at this pre-processing stage.?

4. Try to fix illumination of image (e.g. no dark part of image).

Can I perform illumination of the image using OpenCV?

5. Binarize and de-noise image.

I get poor binarized images when I apply a threshold or adaptive threshold for the real-time image.

How can I binarize these real-time images?

like image 457
balajichinna Avatar asked Sep 05 '14 07:09

balajichinna


People also ask

How accurate is Tesseract OCR?

The Tesseract OCR accuracy is fairly high out of the box and can be increased significantly with a well designed Tesseract image preprocessing pipeline. Furthermore, the Tesseract developer community sees a lot of activity these days and a new major version (Tesseract 4.0) is on its way.

Does tesseract automatically apply image processing algorithms?

To some degree, Tesseract automatically applies them. It is also possible to tell Tesseract to write an intermediate image for inspection, i.e. to check how well the internal image processing works (search for tessedit_write_images in the above reference).

How to improve accuracy of OCR using image preprocessing?

Improve Accuracy of OCR using Image Preprocessing 1 Scaling of image : Image Rescaling is important for image analysis. ... 2 Skew Correction : A Skewed image is defined as a document image which is not straight. ... 3 Binarization : Mostly, an OCR engine does binarization internally because they work on Black & White images. ... More items...

Why pre-process images before feeding to tesseract 4?

This ensures easier transitions to other OCR engines as it doesn’t directly rely on concrete implementations but only on outputs - at the cost of processing power and optimality. Proposed Solution The solution consists in directly preprocessing images before they are fed to Tesseract 4.


1 Answers

    1. and 2.: When a text has a point size of 12, it means that it takes up 12 pixels of height at 72 DPI. At 300 DPI this is about 50 pixels. So what you should take from 1. and 2. is that you should try to make the resolution of the captured image so that the lines of text is around 50 pixels tall. How you would do this depends on how you are capturing the image.
    1. It is easier to ask the user to hold the camera straight :-)
    1. and 5.. you could try to apply some filtering. Again, it might be easier to ask the use to ensure proper lighting is applied.
like image 120
Jesper Schläger Avatar answered Sep 27 '22 19:09

Jesper Schläger