How do I rotate individual letters of an image into the right orientation for optimal OCR?

Question

In my previous question, I transformed this image:

enter image description here

into this:

enter image description here

which Tesseract OCR interprets as this:

1O351

Putting a frame around the image

enter image description here

actually improves the OCR result.

 1CB51

However, I need all 5 characters to OCR correctly, so as an experiment I used Paint.NET to rotate and align each individual letter into its proper orientation:

enter image description here

Resulting in the correct answer:

1CB52

How would I go about performing this correction in C#?

I've done a bit of research on various text alignment algorithms, but they all assume the existence of lines of text in the source image, lines from which you can derive a rotation angle, but which already contain the proper spacing and orientation relationships between the letters.

TheLethalCoder · Accepted Answer

You can use the code in the following code project article to segment each individual character. However, when trying to deskew these characters individually any result you get is not going to be very good because there isn't very much information to go off of.

I tried using AForge.NETs HoughLineTransformation class and I got angles in the range of 80 - 90 degrees. So I tried using the following code to deskew them:

private static Bitmap DeskewImageByIndividualChars(Bitmap targetBitmap)
{
    IDictionary<Rectangle, Bitmap> characters = new CCL().Process(targetBitmap);

    using (Graphics g = Graphics.FromImage(targetBitmap))
    {
        foreach (var character in characters)
        {
            double angle;

            BitmapData bitmapData = character.Value.LockBits(new Rectangle(Point.Empty, character.Value.Size), ImageLockMode.ReadWrite, PixelFormat.Format8bppIndexed);
            try
            {
                HoughLineTransformation hlt = new HoughLineTransformation();
                hlt.ProcessImage(bitmapData);

                angle = hlt.GetLinesByRelativeIntensity(0.5).Average(l => l.Theta);
            }
            finally
            {
                character.Value.UnlockBits(bitmapData);
            }

            using (Bitmap bitmap = RotateImage(character.Value, 90 - angle, Color.White))
            {
                g.DrawImage(bitmap, character.Key.Location);
            }
        }
    }

    return targetBitmap;
}

With the RotateImage method taken from here. However, the results didn't seem to be the best. Maybe you can try and make them better.

Here is the code from the code project article for your reference. I have made a few changes to it so that it behaves a bit safer, such as adding try-finally around the LockBits and disposing of objects properly using the using statement etc.

using System.Collections.Generic;
using System.Drawing;
using System.Drawing.Imaging;
using System.Linq;

namespace ConnectedComponentLabeling
{
    public class CCL
    {
        private Bitmap _input;
        private int[,] _board;

        public IDictionary<Rectangle, Bitmap> Process(Bitmap input)
        {
            _input = input;
            _board = new int[_input.Width, _input.Height];

            Dictionary<int, List<Pixel>> patterns = Find();
            var images = new Dictionary<Rectangle, Bitmap>();

            foreach (KeyValuePair<int, List<Pixel>> pattern in patterns)
            {
                using (Bitmap bmp = CreateBitmap(pattern.Value))
                {
                    images.Add(GetBounds(pattern.Value), (Bitmap)bmp.Clone());
                }
            }

            return images;
        }

        protected virtual bool CheckIsBackGround(Pixel currentPixel)
        {
            return currentPixel.color.A == 255 && currentPixel.color.R == 255 && currentPixel.color.G == 255 && currentPixel.color.B == 255;
        }

        private unsafe Dictionary<int, List<Pixel>> Find()
        {
            int labelCount = 1;
            var allLabels = new Dictionary<int, Label>();

            BitmapData imageData = _input.LockBits(new Rectangle(0, 0, _input.Width, _input.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
            try
            {
                int bytesPerPixel = 3;

                byte* scan0 = (byte*)imageData.Scan0.ToPointer();
                int stride = imageData.Stride;

                for (int i = 0; i < _input.Height; i++)
                {
                    byte* row = scan0 + (i * stride);

                    for (int j = 0; j < _input.Width; j++)
                    {
                        int bIndex = j * bytesPerPixel;
                        int gIndex = bIndex + 1;
                        int rIndex = bIndex + 2;

                        byte pixelR = row[rIndex];
                        byte pixelG = row[gIndex];
                        byte pixelB = row[bIndex];

                        Pixel currentPixel = new Pixel(new Point(j, i), Color.FromArgb(pixelR, pixelG, pixelB));

                        if (CheckIsBackGround(currentPixel))
                        {
                            continue;
                        }

                        IEnumerable<int> neighboringLabels = GetNeighboringLabels(currentPixel);
                        int currentLabel;

                        if (!neighboringLabels.Any())
                        {
                            currentLabel = labelCount;
                            allLabels.Add(currentLabel, new Label(currentLabel));
                            labelCount++;
                        }
                        else
                        {
                            currentLabel = neighboringLabels.Min(n => allLabels[n].GetRoot().Name);
                            Label root = allLabels[currentLabel].GetRoot();

                            foreach (var neighbor in neighboringLabels)
                            {
                                if (root.Name != allLabels[neighbor].GetRoot().Name)
                                {
                                    allLabels[neighbor].Join(allLabels[currentLabel]);
                                }
                            }
                        }

                        _board[j, i] = currentLabel;
                    }
                }
            }
            finally
            {
                _input.UnlockBits(imageData);
            }

            Dictionary<int, List<Pixel>> patterns = AggregatePatterns(allLabels);

            patterns = RemoveIntrusions(patterns, _input.Width, _input.Height);

            return patterns;
        }

        private Dictionary<int, List<Pixel>> RemoveIntrusions(Dictionary<int, List<Pixel>> patterns, int width, int height)
        {
            var patternsCleaned = new Dictionary<int, List<Pixel>>();

            foreach (var pattern in patterns)
            {
                bool bad = false;
                foreach (Pixel item in pattern.Value)
                {
                    //Horiz
                    if (item.Position.X == 0)
                        bad = true;

                    else if (item.Position.Y == width - 1)
                        bad = true;

                    //Vert
                    else if (item.Position.Y == 0)
                        bad = true;

                    else if (item.Position.Y == height - 1)
                        bad = true;
                }

                if (!bad)
                    patternsCleaned.Add(pattern.Key, pattern.Value);

            }

            return patternsCleaned;
        }

        private IEnumerable<int> GetNeighboringLabels(Pixel pix)
        {
            var neighboringLabels = new List<int>();

            for (int i = pix.Position.Y - 1; i <= pix.Position.Y + 2 && i < _input.Height - 1; i++)
            {
                for (int j = pix.Position.X - 1; j <= pix.Position.X + 2 && j < _input.Width - 1; j++)
                {
                    if (i > -1 && j > -1 && _board[j, i] != 0)
                    {
                        neighboringLabels.Add(_board[j, i]);
                    }
                }
            }

            return neighboringLabels;
        }

        private Dictionary<int, List<Pixel>> AggregatePatterns(Dictionary<int, Label> allLabels)
        {
            var patterns = new Dictionary<int, List<Pixel>>();

            for (int i = 0; i < _input.Height; i++)
            {
                for (int j = 0; j < _input.Width; j++)
                {
                    int patternNumber = _board[j, i];
                    if (patternNumber != 0)
                    {
                        patternNumber = allLabels[patternNumber].GetRoot().Name;

                        if (!patterns.ContainsKey(patternNumber))
                        {
                            patterns[patternNumber] = new List<Pixel>();
                        }

                        patterns[patternNumber].Add(new Pixel(new Point(j, i), Color.Black));
                    }
                }
            }

            return patterns;
        }

        private unsafe Bitmap CreateBitmap(List<Pixel> pattern)
        {
            int minX = pattern.Min(p => p.Position.X);
            int maxX = pattern.Max(p => p.Position.X);

            int minY = pattern.Min(p => p.Position.Y);
            int maxY = pattern.Max(p => p.Position.Y);

            int width = maxX + 1 - minX;
            int height = maxY + 1 - minY;

            Bitmap bmp = DrawFilledRectangle(width, height);

            BitmapData imageData = bmp.LockBits(new Rectangle(0, 0, bmp.Width, bmp.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
            try
            {
                byte* scan0 = (byte*)imageData.Scan0.ToPointer();
                int stride = imageData.Stride;

                foreach (Pixel pix in pattern)
                {
                    scan0[((pix.Position.X - minX) * 3) + (pix.Position.Y - minY) * stride] = pix.color.B;
                    scan0[((pix.Position.X - minX) * 3) + (pix.Position.Y - minY) * stride + 1] = pix.color.G;
                    scan0[((pix.Position.X - minX) * 3) + (pix.Position.Y - minY) * stride + 2] = pix.color.R;
                }
            }
            finally
            {
                bmp.UnlockBits(imageData);
            }

            return bmp;
        }

        private Bitmap DrawFilledRectangle(int x, int y)
        {
            Bitmap bmp = new Bitmap(x, y);
            using (Graphics graph = Graphics.FromImage(bmp))
            {
                Rectangle ImageSize = new Rectangle(0, 0, x, y);
                graph.FillRectangle(Brushes.White, ImageSize);
            }

            return bmp;
        }

        private Rectangle GetBounds(List<Pixel> pattern)
        {
            var points = pattern.Select(x => x.Position);

            var x_query = points.Select(p => p.X);
            int xmin = x_query.Min();
            int xmax = x_query.Max();

            var y_query = points.Select(p => p.Y);
            int ymin = y_query.Min();
            int ymax = y_query.Max();

            return new Rectangle(xmin, ymin, xmax - xmin, ymax - ymin);
        }
    }
}

With the above code I got the following input/output:

Input Output

As you can see the B has rotated quite well but the others aren't as good.

An alternative to trying to deskew the individual characters is to find there location using the segmentation routine above. Then passing each individual character through to your recognition engine separately and seeing if this improves your results.

I have used the following method to find the angle of the character using the List<Pixel> from inside the CCL class. It works by finding the angle between the "bottom left" and "bottom right" points. I haven't tested if it works if the character is rotated the other way around.

private double GetAngle(List<Pixel> pattern)
{
    var pixels = pattern.Select(p => p.Position).ToArray();

    Point bottomLeft = pixels.OrderByDescending(p => p.Y).ThenBy(p => p.X).First();
    Point rightBottom = pixels.OrderByDescending(p => p.X).ThenByDescending(p => p.Y).First();

    int xDiff = rightBottom.X - bottomLeft.X;
    int yDiff = rightBottom.Y - bottomLeft.Y;

    double angle = Math.Atan2(yDiff, xDiff) * 180 / Math.PI;

    return -angle;
}

Note my drawing code is a bit broken so that is why the 5 is cut off on the right but this code produces the following output:

Output

Note that the B and the 5 are rotated further than you'd expect because of their curvature.

Using the following code by getting the angle from the left and right edges and then choosing the best one, the rotations seems to be better. Note I have only tested it with letters that need rotating clockwise so if they need to go the opposite way it might not work too well.

This also "quadrants" the pixels so that each pixel is chosen from it's own quadrant as not to get two that are too nearby.

The idea in selecting the best angle is if they are similar, at the moment within 1.5 degrees of each other but can easily be updated, average them. Else we pick the one that is closest to zero.

private double GetAngle(List<Pixel> pattern, Rectangle bounds)
{
    int halfWidth = bounds.X + (bounds.Width / 2);
    int halfHeight = bounds.Y + (bounds.Height / 2);

    double leftEdgeAngle = GetAngleLeftEdge(pattern, halfWidth, halfHeight);
    double rightEdgeAngle = GetAngleRightEdge(pattern, halfWidth, halfHeight);

    if (Math.Abs(leftEdgeAngle - rightEdgeAngle) <= 1.5)
    {
        return (leftEdgeAngle + rightEdgeAngle) / 2d;
    }

    if (Math.Abs(leftEdgeAngle) > Math.Abs(rightEdgeAngle))
    {
        return rightEdgeAngle;
    }
    else
    {
        return leftEdgeAngle;
    }
}

private double GetAngleLeftEdge(List<Pixel> pattern, double halfWidth, double halfHeight)
{
    var topLeftPixels = pattern.Select(p => p.Position).Where(p => p.Y < halfHeight && p.X < halfWidth).ToArray();
    var bottomLeftPixels = pattern.Select(p => p.Position).Where(p => p.Y > halfHeight && p.X < halfWidth).ToArray();

    Point topLeft = topLeftPixels.OrderBy(p => p.X).ThenBy(p => p.Y).First();
    Point bottomLeft = bottomLeftPixels.OrderByDescending(p => p.Y).ThenBy(p => p.X).First();

    int xDiff = bottomLeft.X - topLeft.X;
    int yDiff = bottomLeft.Y - topLeft.Y;

    double angle = Math.Atan2(yDiff, xDiff) * 180 / Math.PI;

    return 90 - angle;
}

private double GetAngleRightEdge(List<Pixel> pattern, double halfWidth, double halfHeight)
{
    var topRightPixels = pattern.Select(p => p.Position).Where(p => p.Y < halfHeight && p.X > halfWidth).ToArray();
    var bottomRightPixels = pattern.Select(p => p.Position).Where(p => p.Y > halfHeight && p.X > halfWidth).ToArray();

    Point topRight = topRightPixels.OrderBy(p => p.Y).ThenByDescending(p => p.X).First();
    Point bottomRight = bottomRightPixels.OrderByDescending(p => p.X).ThenByDescending(p => p.Y).First();

    int xDiff = bottomRight.X - topRight.X;
    int yDiff = bottomRight.Y - topRight.Y;

    double angle = Math.Atan2(xDiff, yDiff) * 180 / Math.PI;

    return Math.Abs(angle);
}

This now produces the following output, again my drawing code is slightly broken. Note that the C looks to not have deskewed very well but looking closely it is just the shape of it that has caused this to happen.

Output

I improved the drawing code and also attempted to get the characters onto the same baseline:

private static Bitmap DeskewImageByIndividualChars(Bitmap bitmap)
{
    IDictionary<Rectangle, Tuple<Bitmap, double>> characters = new CCL().Process(bitmap);

    Bitmap deskewedBitmap = new Bitmap(bitmap.Width, bitmap.Height, bitmap.PixelFormat);
    deskewedBitmap.SetResolution(bitmap.HorizontalResolution, bitmap.VerticalResolution);

    using (Graphics g = Graphics.FromImage(deskewedBitmap))
    {
        g.FillRectangle(Brushes.White, new Rectangle(Point.Empty, deskewedBitmap.Size));

        int baseLine = characters.Max(c => c.Key.Bottom);
        foreach (var character in characters)
        {
            int y = character.Key.Y;
            if (character.Key.Bottom != baseLine)
            {
                y += (baseLine - character.Key.Bottom - 1);
            }

            using (Bitmap characterBitmap = RotateImage(character.Value.Item1, character.Value.Item2, Color.White))
            {
                g.DrawImage(characterBitmap, new Point(character.Key.X, y));
            }
        }
    }

    return deskewedBitmap;
}

This then produces the following output. Note each character isn't on the exact same baseline due to the pre rotation bottom being taken to work it out. To improve the code using the baseline from post rotation would be needed. Also thresholding the image before doing the baseline would help.

Another improvement would be to calculate the Right of each of the rotated characters locations so when drawing the next one it doesn't overlap the previous and cut bits off. Because as you can see in the output the 2 is slightly cutting into the 5.

The output is now very similar to the manually created one in the OP.

Output

How do I rotate individual letters of an image into the right orientation for optimal OCR?

Tags:

c#

tesseract

aforge

Robert Harvey

1 Answers

TheLethalCoder

Recent Activity

Donate For Us

How do I rotate individual letters of an image into the right orientation for optimal OCR?

Tags:

c#

tesseract

aforge

Robert Harvey

1 Answers

TheLethalCoder

Related questions

Recent Activity

Donate For Us