Detect white characters on black background using Tesseract

Tags:

tesseract

I'm completely new to Tesseract OCR. This problem might be simple but I can't seem to find the answer using Google.

Basically, I have an image that contains two parts: the first part, which is at the top of the image, has a black background with texts in white color; the second part, which is at the bottom of the image, has white background with texts in black color.

I ran tesseract on the image, which correctly recognized all characters in the bottom part, but none in the top part. I am sure that the characters on the top part is very clear and should be easy to recognize by Tesseract. The only difference is that it has black background.

Is there a way to use Tesseract to recognize texts in both black and white background at the same time?

403

asked Aug 17 '16 17:08

Chaoran

1 Answers

A paper by T. Kasar, J. Kumar, and A. G. Ramakrishnan describes one solution to the problem: "Font and Background Color Independent Text Binarization". The paper can be found here. There is an implementation of the algorithm by Jason Funk. His implementation can be found here. I have had some success with the algorithm. I think this type of solution is what you are looking for.

You might also find it helpful to review this recently asked question on background removal (OpenCV for OCR: How to compute thresholding levels for gray image OCR) and its answer. You may be able separate regions of interest by background color and then hand each region to tesseract for processing. Alternatively, post binarization you could invert the 8x8 pixel regions (described in answer above) in the black background portion of the image (or vice versus) to create a uniform background.

Finally, you may find some useful information by searching for solutions to the number plate recognition problem (or license plates). Many number plates (license plates) have background images or lighting artifacts that can interfere with recognition. The more general problem is background removal.

answered Oct 15 '22 05:10

John Morris

Related questions
                            
                                How to implement Tesseract to run with project in Visual Studio 2010
                            
                                configure: error: leptonica library missing (when building tesseract-ocr-3.01 on MinGW)
                            
                                Strength of Dictionary in Tesseract 3
                            
                                Extracting paragraph breaks from OCR text?
                            
                                Tesseract does not recognize german "für"
                            
                                How to detect subscript numbers in an image using OCR?
                            
                                Tesseract OCR Text Position
                            
                                How to detect Text Area from image?
                            
                                Android JNI DETECTED ERROR IN APPLICATION: JNI GetMethodID called with pending exception
                            
                                Python Tesseract can't recognize this font
                            
                                Can I test tesseract ocr in windows command line?
                            
                                Installing Tesseract-OCR on CentOS 6
                            
                                Unable to load library 'tesseract': libtesseract.so: cannot open shared object file: No such file or directory
                            
                                Android: How to improve the numbers within the image retrieved by tesseract ocr?
                            
                                get the exact position of text from image in tesseract
                            
                                Tesseract - ERROR net.sourceforge.tess4j.Tesseract - null
                            
                                How to recognize MICR codes in Android
                            
                                Training tesseract 4 with images instead of font

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With