Improve Tesseract OCR results with blurred text

Tags:

I am working on OCR recognition of printed text. In particular I am focusing on the preprocessing step to improve the results of the Tesseract engine. I have already obtained good results with adaptive thresholding, noise removal, text deskew, etc... But still Tesseract seems to fail when other commercial product return decent results.

I used the following test image and here are the results obtained with Tesseract 3.04 compared to two commercial OCR apis. All the 3 services were provided with the same binary image that contains some slightly blurred text.

Text image used to compared the 3 OCR products

Tesseract

Careers in Technology Consulting

Networking Lunch
21 m 2014, 11:00 - 14:30

Definingthecorporatellstmtegy, Wammmwdngdeal, creating
uniquebwinessisighnwilgbigdam-doesﬂismﬂxemmyouaﬁoy?

Findoutmoreabanhowitfeektomkasatedlﬂogymbyjoiningour

for further mm please visit mAeloittexom/weers

ABBYY Fine Reader Online

Careers in Technology Consulting
Networking Lunch
21 November 2014,1140-14:30
Defining the corporate IT strategy, planning a multHnKon <Mar outsourcing deal, creating unique business insights using big data-doesthis sound Ifce something you enjoy?
Find out more about hour it feels to work as a technology consultant by joining our exclusive networking lunch,
For further information please visit wrwMuleloittexom/carcert

Online OCR

Careers in Technology Consulting Networking Lunch 21 November 2014, 11;00 —14:30 
Defining the corporate IT strategy, planning a muiti-indlimi dollar outsourcing deal, creating unique business insights using big data—does this sound like something you enjoy? 
Find out more about how it feels to work as a tedmology consultant by joining our exclusive networking lunch, 
For further information' please visit wwwdeloitte,com/careers

Now I wonder whether the big gap between Tesseract and the other two products is due to a different engine (for sure ABBYY uses its own engine, not sure about OCR Web Service) or there are some other preprocessing steps that can be done before running Tesseract. Do you have any suggestions?

761

asked Dec 27 '14 21:12

Marco Ancona

1 Answers

Here a suggestion for "magic" OCR preprocessing. In order to explain the principle of the proposed preprocessing idea, let's consider an excerpt from the provided text image on which all of the tested OCRs failed :

original image

and apply to it some "preprocessing-wisdom". First the usual thresholding:

thresholded image

and then some "magic" by shooting vertical lines through word-elements, detecting max. 2 pixel high "bars" and cutting them at their edges along with cutting the word-element down to its bottom line:

after extracting "i"s

Now switching from shooting lines through the word-elements in this image from vertical to horizontal ones in order to detect very wide "bars" and cut them vertical in the middle of their width:

after splitting grown-together characters

This should help any OCR-engine to provide better results on this particular image. I can imagine that some of the commercial OCR-engines use this approach already being able to provide a better recognition than this ones tested.

In this context let me mention another free OCR-engines available in the Ubuntu repositories (comparable with tesseract). Testing them against each other you can wonder even more how it comes that they provide different results and then look into their source code to know :) and infer from this experience something about the commercial ones.

sudo apt-get install cuneiform gocr ocrad

111

answered Oct 06 '22 22:10

Claudio

Related questions
                            
                                Some images are being rotated when resized
                            
                                Simple passing of Matrices ie. cv::Mat to functions in OpenCV2.4
                            
                                How to make a 2D Gaussian Filter in Tensorflow?
                            
                                Simple C image library? [closed]
                            
                                OpenCV - approxPolyDP for edge maps (not contours)
                            
                                Find dominant color on an image
                            
                                Image smoothing in Python
                            
                                accessing pixel value of gray scale image in OpenCV
                            
                                Real TIme Image Processing (OCR) [closed]
                            
                                iOS Custom UIImagePickerController Camera Crop to Square
                            
                                Find extreme outer points in image with Python OpenCV
                            
                                How to use .predict_generator() on new Images - Keras
                            
                                How to programmatically change the hue of UIImage?
                            
                                How to "smart resize" a displayed image to original aspect ratio
                            
                                Applying a coloured overlay to an image in either PIL or Imagemagik
                            
                                Read Multiple images on a folder in OpenCv (python)
                            
                                Sharpen on a Bitmap using C#
                            
                                With the Python Imaging Library (PIL), how does one compose an image with an alpha channel over another image?
                            
                                Camera Module Focus Adjust using Contrast Transfer Function
                            
                                Detect clusters of circular objects by iterative adaptive thresholding and shape analysis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Improve Tesseract OCR results with blurred text

Tags:

image-processing

ocr

tesseract

motion-blur

Marco Ancona

People also ask

1 Answers

Claudio

Recent Activity

Donate For Us