image processing to improve tesseract OCR accuracy

People also ask

How can I improve my image processing accuracy?

Increase Contrast Increase the contrast and density before carrying out the OCR process. This can be done in the scanning software itself or in any other image processing software. Increasing the contrast between the text/image and its background brings out more clarity in the output.

How accurate is Tesseract OCR?

Combinations of the first three preprocessing actions are said to boost the accuracy of Tesseract 4.0 from 70.2% to 92.9%.

Is there a better OCR than Tesseract?

Google does well on the scanned email and recognizes the text in the smartphone-captured document similarly well as ABBYY. However it is much better than Tesseract or ABBYY in recognizing handwriting, as the second result image shows: still far from perfect, but at least it got some things right.

Can I train Tesseract OCR?

Luckily, you can train your Tesseract so it can read your font easily.

fix DPI (if needed) 300 DPI is minimum
fix text size (e.g. 12 pt should be ok)
try to fix text lines (deskew and dewarp text)
try to fix illumination of image (e.g. no dark part of image)
binarize and de-noise image

There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.

If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.

I am by no means an OCR expert. But I this week had the need to convert text out of a jpg.

I started with a colorized, RGB 445x747 pixel jpg. I immediately tried tesseract on this, and the program converted almost nothing. I then went into GIMP and did the following.

image > mode > grayscale
image > scale image > 1191x2000 pixels
filters > enhance > unsharp mask with values of
radius = 6.8, amount = 2.69, threshold = 0

I then saved as a new jpg at 100% quality.

Tesseract then was able to extract all the text into a .txt file

Gimp is your friend.

As a rule of thumb, I usually apply the following image pre-processing techniques using OpenCV library:

Rescaling the image (it's recommended if you’re working with images that have a DPI of less than 300 dpi):
```
img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
```

Converting image to grayscale:

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

Applying dilation and erosion to remove the noise (you may play with the kernel size depending on your data set):

kernel = np.ones((1, 1), np.uint8)
img = cv2.dilate(img, kernel, iterations=1)
img = cv2.erode(img, kernel, iterations=1)

Applying blur, which can be done by using one of the following lines (each of which has its pros and cons, however, median blur and bilateral filter usually perform better than gaussian blur.):

cv2.threshold(cv2.GaussianBlur(img, (5, 5), 0), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

cv2.threshold(cv2.bilateralFilter(img, 5, 75, 75), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

cv2.threshold(cv2.medianBlur(img, 3), 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]

cv2.adaptiveThreshold(cv2.GaussianBlur(img, (5, 5), 0), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

cv2.adaptiveThreshold(cv2.bilateralFilter(img, 9, 75, 75), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

cv2.adaptiveThreshold(cv2.medianBlur(img, 3), 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 31, 2)

I've recently written a pretty simple guide to Tesseract but it should enable you to write your first OCR script and clear up some hurdles that I experienced when things were less clear than I would have liked in the documentation.

In case you'd like to check them out, here I'm sharing the links with you:

Getting started with Tesseract - Part I: Introduction
Getting started with Tesseract - Part II: Image Pre-processing

Three points to improve the readability of the image:

Resize the image with variable height and width(multiply 0.5 and 1 and 2 with image height and width).
Convert the image to Gray scale format(Black and white).
Remove the noise pixels and make more clear(Filter the image).

Refer below code :

Resize

public Bitmap Resize(Bitmap bmp, int newWidth, int newHeight)
        {
         
                Bitmap temp = (Bitmap)bmp;
            
                Bitmap bmap = new Bitmap(newWidth, newHeight, temp.PixelFormat);
             
                double nWidthFactor = (double)temp.Width / (double)newWidth;
                double nHeightFactor = (double)temp.Height / (double)newHeight;

                double fx, fy, nx, ny;
                int cx, cy, fr_x, fr_y;
                Color color1 = new Color();
                Color color2 = new Color();
                Color color3 = new Color();
                Color color4 = new Color();
                byte nRed, nGreen, nBlue;

                byte bp1, bp2;

                for (int x = 0; x < bmap.Width; ++x)
                {
                    for (int y = 0; y < bmap.Height; ++y)
                    {

                        fr_x = (int)Math.Floor(x * nWidthFactor);
                        fr_y = (int)Math.Floor(y * nHeightFactor);
                        cx = fr_x + 1;
                        if (cx >= temp.Width) cx = fr_x;
                        cy = fr_y + 1;
                        if (cy >= temp.Height) cy = fr_y;
                        fx = x * nWidthFactor - fr_x;
                        fy = y * nHeightFactor - fr_y;
                        nx = 1.0 - fx;
                        ny = 1.0 - fy;

                        color1 = temp.GetPixel(fr_x, fr_y);
                        color2 = temp.GetPixel(cx, fr_y);
                        color3 = temp.GetPixel(fr_x, cy);
                        color4 = temp.GetPixel(cx, cy);

                        // Blue
                        bp1 = (byte)(nx * color1.B + fx * color2.B);

                        bp2 = (byte)(nx * color3.B + fx * color4.B);

                        nBlue = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Green
                        bp1 = (byte)(nx * color1.G + fx * color2.G);

                        bp2 = (byte)(nx * color3.G + fx * color4.G);

                        nGreen = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        // Red
                        bp1 = (byte)(nx * color1.R + fx * color2.R);

                        bp2 = (byte)(nx * color3.R + fx * color4.R);

                        nRed = (byte)(ny * (double)(bp1) + fy * (double)(bp2));

                        bmap.SetPixel(x, y, System.Drawing.Color.FromArgb
                (255, nRed, nGreen, nBlue));
                    }
                }

       

                bmap = SetGrayscale(bmap);
                bmap = RemoveNoise(bmap);

                return bmap;
            
        }

SetGrayscale

public Bitmap SetGrayscale(Bitmap img)
            {
    
                Bitmap temp = (Bitmap)img;
                Bitmap bmap = (Bitmap)temp.Clone();
                Color c;
                for (int i = 0; i < bmap.Width; i++)
                {
                    for (int j = 0; j < bmap.Height; j++)
                    {
                        c = bmap.GetPixel(i, j);
                        byte gray = (byte)(.299 * c.R + .587 * c.G + .114 * c.B);
    
                        bmap.SetPixel(i, j, Color.FromArgb(gray, gray, gray));
                    }
                }
                return (Bitmap)bmap.Clone();
    
            }

RemoveNoise

public Bitmap RemoveNoise(Bitmap bmap)
            {
    
                for (var x = 0; x < bmap.Width; x++)
                {
                    for (var y = 0; y < bmap.Height; y++)
                    {
                        var pixel = bmap.GetPixel(x, y);
                        if (pixel.R < 162 && pixel.G < 162 && pixel.B < 162)
                            bmap.SetPixel(x, y, Color.Black);
                        else if (pixel.R > 162 && pixel.G > 162 && pixel.B > 162)
                            bmap.SetPixel(x, y, Color.White);
                    }
                }
    
                return bmap;
            }

INPUT IMAGE

OUTPUT IMAGE

This is somewhat ago but it still might be useful.

My experience shows that resizing the image in-memory before passing it to tesseract sometimes helps.

Try different modes of interpolation. The post https://stackoverflow.com/a/4756906/146003 helped me a lot.

What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project. http://sourceforge.net/projects/capture2text/files/Capture2Text/.

BTW: Kudos to it's author for sharing such a painstaking algorithm.

Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.

If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.

P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.

Related questions
                            
                                Resize image in PHP
                            
                                Image Segmentation using Mean Shift explained
                            
                                Image to ASCII art conversion
                            
                                What Haskell representation is recommended for 2D, unboxed pixel arrays with millions of pixels?
                            
                                Which library should I use for server-side image manipulation on Node.JS? [closed]
                            
                                How to sort my paws?
                            
                                Show an image preview before upload
                            
                                Convert image from PIL to openCV format
                            
                                Resize image proportionally with MaxHeight and MaxWidth constraints
                            
                                How to check if a specific pixel of an image is transparent?
                            
                                Converting RGB to grayscale/intensity
                            
                                Get Image size WITHOUT loading image into memory
                            
                                High Quality Image Scaling Library [closed]
                            
                                How to merge images in command line? [closed]
                            
                                cv2.imshow command doesn't work properly in opencv-python
                            
                                How can I sharpen an image in OpenCV?
                            
                                converting a base 64 string to an image and saving it
                            
                                Algorithm to compare two images
                            
                                Extracting text OpenCV
                            
                                Face recognition Library [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

image processing to improve tesseract OCR accuracy

Tags:

image-processing

ocr

tesseract

People also ask

Recent Activity

Donate For Us