Image processing for OCR with leptonica (inverse color text)

Tags:

I am trying to process the following image with leptonica to extract text with tesseract.

Original Image:

Tesseract on the original image yields this:

i s l
D2J1FiiE-l191x1iitmwii9 uhiaiislz-2 Q ~37
Bottom linez
With a little time!
you can learn social media technology
using free online resources-
And if you donity
youlll be at a significant disadvantage
to
other HOn-pFOiiTS-

Not great, especially the top background. So using leptionica I use a background removal algorithm (blur, difference, threshold, invert) to get the following image: processed image

But tesseract doesn't do a good job with it:

@@r-mair lkrm@W lh@w ilr@ mJs@ iklh@ ii@c2lhm1@ll
mm Mime
VWU1 a Mitt-Jle time-
@1m ll@@Wn Om @@@lh1
using free onhne resources-
Andifyoudoni
9110 ate a $0 D
to other non-profrts
I

The main problem, it seems, is that now all of the text is outlined instead of solid. How can I adjust my algorithm or what can I add to made the text solid?

481

asked Jul 26 '12 21:07

jasonlfunk

1 Answers

It seems that this paper proposes a binarization method which solves your problem:

T Kasar, J Kumar and A G Ramakrishnan. Font and Background Color Independent Text Binarization. (2007)

Kasar etal method performance

answered Nov 14 '22 07:11

sastanin

Related questions
                            
                                Simple C image library? [closed]
                            
                                OpenCV - approxPolyDP for edge maps (not contours)
                            
                                Find dominant color on an image
                            
                                Image smoothing in Python
                            
                                accessing pixel value of gray scale image in OpenCV
                            
                                Real TIme Image Processing (OCR) [closed]
                            
                                iOS Custom UIImagePickerController Camera Crop to Square
                            
                                Find extreme outer points in image with Python OpenCV
                            
                                How to use .predict_generator() on new Images - Keras
                            
                                How to programmatically change the hue of UIImage?
                            
                                How to "smart resize" a displayed image to original aspect ratio
                            
                                Applying a coloured overlay to an image in either PIL or Imagemagik
                            
                                Read Multiple images on a folder in OpenCv (python)
                            
                                Sharpen on a Bitmap using C#
                            
                                With the Python Imaging Library (PIL), how does one compose an image with an alpha channel over another image?
                            
                                Camera Module Focus Adjust using Contrast Transfer Function
                            
                                Detect clusters of circular objects by iterative adaptive thresholding and shape analysis
                            
                                Improve Tesseract OCR results with blurred text
                            
                                Fast algorithm for finding a small picture in big picture?
                            
                                How to connect the ends of edges in order to close the holes between them?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Image processing for OCR with leptonica (inverse color text)

Tags:

image-processing

ocr

tesseract

jasonlfunk

People also ask

1 Answers

sastanin

Recent Activity

Donate For Us