How to Improve OCR on image with text in different colors and fonts?

Q: How to improve accuracy of reading text from image documents?

Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results. 1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region.

Q: How do I perform OCR on a PDF document?

Performing OCR on a document is literally a no-brainer because PDFelement tells you exactly what to do. The moment you open a non-editable PDF file or use the Create PDF to convert an image to PDF, it recognizes this and prompts you to install the OCR plugin and perform OCR. Here’s what you’ll see on your screen:

Tags:

python

python-imaging-library

ocr

google-vision

I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.

every time I change the image from the original I lose accuracy in detecting some characters.

I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.

Example:

some variations on the image from gray scale or b&w

Original Image

enter image description here

What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?

some ideas I already tried, also some thresholding.

dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)

contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)

sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)

442

asked Aug 11 '18 20:08

RaedMarji

1 Answers

I can only offer a butcher's solution, potentially a nightmare to maintain.

In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.

My prerequisites:

I knew exactly in which area of the screen the text was going to go.
I knew exactly which fonts and colors were going to be used.
the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
I could not detect reliably text changes to average frames and reduce the interference.

What I did: - I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about. - The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.

(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).

In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.

161

answered Oct 04 '22 19:10

LSerni

Related questions
                            
                                Plotly: How to use the same color scheme on two subplots?
                            
                                designing complex data structure's dependences
                            
                                Conda showing two versions of requests library
                            
                                How to use numpy functions on a keras tensor in the loss function?
                            
                                Overly large .exe file when using pyinstaller
                            
                                Python bug: null byte in input prompt
                            
                                What are exactly the standard streams if there's no terminal/console window open for the python interpreter?
                            
                                Django admin: Inline straight to second-level relationship
                            
                                PyTorch Linear Algebra Gradients
                            
                                Setting stdout to non-blocking in python
                            
                                difference in predictions between model.predict() and model.predict_generator() in keras
                            
                                Unable to connect to Hive2 using Python
                            
                                How to download pip packages for a different operating system?
                            
                                Why use more than one equal sign in a statement with the same variable?
                            
                                Python socket connect() vs. connect_ex()
                            
                                ENIGMA CATALYST - WARNING: Loader: Refusing to download new treasury data because a download succeeded
                            
                                How do I re-use trained fastai models?
                            
                                Boost.Python return python object which references to existing c++ objects
                            
                                Lambda expression in cython function
                            
                                Why doesn't numpy.zeros allocate all of its memory on creation? And how can I force it to?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With