Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Improve OCR on image with text in different colors and fonts?

I'm using the Google Vision API to extract the text from some pictures, however, I have been trying to improve the accuracy (confidence) of the results with no luck.

every time I change the image from the original I lose accuracy in detecting some characters.

I have isolated the issue to have multiple colors for different words with can be seen that words in red for example have incorrect results more often than the other words.

Example:

some variations on the image from gray scale or b&w

Original Image

enter image description here

enter image description here

enter image description here

enter image description here

What ideas can I try to make this work better, specifically changing the colors of text to a uniform color or just black on a white background since most algorithms expect that?

some ideas I already tried, also some thresholding.

dimg = ImageOps.grayscale(im)
cimg = ImageOps.invert(dimg)

contrast = ImageEnhance.Contrast(dimg)
eimg = contrast.enhance(1)

sharp = ImageEnhance.Sharpness(dimg)
eimg = sharp.enhance(1)
like image 442
RaedMarji Avatar asked Aug 11 '18 20:08

RaedMarji


People also ask

Why is the Tesseract OCR not accurate?

Inevitably, noise in an input image, non-standard fonts that Tesseract wasn't trained on, or less than ideal image quality will cause Tesseract to make a mistake and incorrectly OCR a piece of text. When that happens, you need to create rules and heuristics that can be used to improve the output OCR quality.

How do I know if my OCR is accurate?

Measuring OCR accuracy is done by taking the output of an OCR run for an image and comparing it to the original version of the same text. You can then either count how many characters were detected correctly (character level accuracy), or count how many words were recognized correctly (word level accuracy).

Is Tesseract OCR good?

While Tesseract is known as one of the most accurate free OCR engines available today, it has numerous limitations that dramatically affect its performance; its ability to correctly recognize characters in a scan or image.

How to improve OCR accuracy?

Bird’s-Eye View On OCR Accuracy. When it comes to improving OCR accuracy, you basically have two moving parts in the equation. If the quality of the original source image is good, i.e. if the human eyes can see the original source clearly, it will be possible to achieve good OCR results.

What is the best DPI for OCR?

For smaller text (font size < 8), it is recommended to have 400–600 DPI Anything above 600 DPI will only increase the image size thereby increasing the processing time but shows no improvement with OCR accuracy. Binarization is the process of converting a colored image (RGB) into a black and white image.

How to improve accuracy of reading text from image documents?

Reading text from image documents using any OCR engine have many issues in order get good accuracy. There is no fixed solution to all the cases but here are a few things which should be considered to improve OCR results. 1) Presence of noise due to poor image quality / unwanted elements/blobs in the background region.

How do I perform OCR on a PDF document?

Performing OCR on a document is literally a no-brainer because PDFelement tells you exactly what to do. The moment you open a non-editable PDF file or use the Create PDF to convert an image to PDF, it recognizes this and prompts you to install the OCR plugin and perform OCR. Here’s what you’ll see on your screen:


1 Answers

I can only offer a butcher's solution, potentially a nightmare to maintain.

In my own, very limited scenario, it worked like a charm where several other OCR engines either failed or had unacceptable running times.

My prerequisites:

  • I knew exactly in which area of the screen the text was going to go.
  • I knew exactly which fonts and colors were going to be used.
  • the text was semitransparent, so the underlying image interfered, and it was a variable image to boot.
  • I could not detect reliably text changes to average frames and reduce the interference.

What I did: - I measured the kerning width of each character. I only had A-Za-z0-9 and a bunch of punctuation characters to worry about. - The program would start at position (0,0), measure the average color to determine the color, then access the whole set of bitmaps generated from characters in all available fonts in that color. Then it would determine which rectangle was closest to the corresponding rectangle on the screen, and advance to the next one.

(Months later, requiring more performances, I added a varying probability matrix to test first the most likely characters).

In the end, the resulting C program was able to read the subtitles out of the video stream with 100% accuracy in real time.

like image 161
LSerni Avatar answered Oct 04 '22 19:10

LSerni