Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using .Net to deskew an image

I have been searching high and low for a reliable way to deskew an image in .Net, and am not having much luck.

At the minute I am using Aforge. This is a pain as I am working with WPF, so the images I am working with are BitmapImage objects, as opposed to Bitmap objects, meaning I need to start with a BitmapImage object, save this to a memory stream, create a new Bitmap object from the memory stream, go through the deskewing process, save the deskewed image to a new memory stream and then create a new BitmapImage object from said memory stream. Not only that, but the deskewing isn't great.

I am trying to read OMR data of a piece of paper scanned into a scanner, and therefore I need to rely on a particular OMR box being at the same co-ordinates every time, so the deskewing needs to be reliable.

So I am using Aforge at the minute, I can't find any other free/open source libraries for image deskewing in .Net, everything I have found is either properly expensive or in C/C++.

My question is do other free/open source libraries exist that assist in image deskewing in .Net? If so what are they called, if not how should I approach this problem?

Edit: For example, let's say I have the below page:

Initial Image

Note: This is for illustrative purposes only, but the actual image does indeed have a black rectangle at each corner of the page, maybe this will help.

When I print this out, and scan it back into my scanner, it looks like this:

Scanned Image

I need to deskew this image so that my box is in the same place each time. In the real world, there are a lot of boxes, they are smaller and close together, so the accuracy is important.

My current method for this is a massive ineffective pain-in-the-ass:

using AForge.Imaging;
using AForge.Imaging.Filters;
using System.Drawing;
using System.Drawing.Imaging;
using System.IO;
using System.Windows.Media.Imaging;

public static BitmapImage DeskewBitmap(BitmapImage skewedBitmap)
{
    //Using a memory stream to minimise disk IO
    var memoryStream = BitmapImageToMemoryStream(skewedBitmap);

    var bitmap = MemoryStreamToBitmap(memoryStream);
    var skewAngle = CalculateSkewAngle(bitmap);

    //Aforge needs a Bppp indexed image for the deskewing process
    var bitmapConvertedToBbppIndexed = ConvertBitmapToBbppIndexed(bitmap);

    var rotatedImage = DeskewBitmap(skewAngle, bitmapConvertedToBbppIndexed);

    //I need to convert the image back to a non indexed format to put it back into a BitmapImage object
    var imageConvertedToNonIndexed = ConvertImageToNonIndexed(rotatedImage);

    var imageAsMemoryStream = BitmapToMemoryStream(imageConvertedToNonIndexed);
    var memoryStreamAsBitmapImage = MemoryStreamToBitmapImage(imageAsMemoryStream);

    return memoryStreamAsBitmapImage;
}

private static Bitmap ConvertImageToNonIndexed(Bitmap rotatedImage)
{
    var imageConvertedToNonIndexed = rotatedImage.Clone(
        new Rectangle(0, 0, rotatedImage.Width, rotatedImage.Height), PixelFormat.Format32bppArgb);
    return imageConvertedToNonIndexed;
}

private static Bitmap DeskewBitmap(double skewAngle, Bitmap bitmapConvertedToBbppIndexed)
{
    var rotationFilter = new RotateBilinear(-skewAngle) { FillColor = Color.White };

    var rotatedImage = rotationFilter.Apply(bitmapConvertedToBbppIndexed);
    return rotatedImage;
}

private static double CalculateSkewAngle(Bitmap bitmapConvertedToBbppIndexed)
{
    var documentSkewChecker = new DocumentSkewChecker();

    double skewAngle = documentSkewChecker.GetSkewAngle(bitmapConvertedToBbppIndexed);

    return skewAngle;
}

private static Bitmap ConvertBitmapToBbppIndexed(Bitmap bitmap)
{
    var bitmapConvertedToBbppIndexed = bitmap.Clone(
        new Rectangle(0, 0, bitmap.Width, bitmap.Height), PixelFormat.Format8bppIndexed);
    return bitmapConvertedToBbppIndexed;
}

private static BitmapImage ResizeBitmap(BitmapImage originalBitmap, int desiredWidth, int desiredHeight)
{
    var ms = BitmapImageToMemoryStream(originalBitmap);
    ms.Position = 0;

    var result = new BitmapImage();
    result.BeginInit();
    result.DecodePixelHeight = desiredHeight;
    result.DecodePixelWidth = desiredWidth;

    result.StreamSource = ms;
    result.CacheOption = BitmapCacheOption.OnLoad;

    result.EndInit();
    result.Freeze();

    return result;
}

private static MemoryStream BitmapImageToMemoryStream(BitmapImage image)
{
    var ms = new MemoryStream();

    var encoder = new JpegBitmapEncoder();
    encoder.Frames.Add(BitmapFrame.Create(image));

    encoder.Save(ms);

    return ms;
}

private static BitmapImage MemoryStreamToBitmapImage(MemoryStream ms)
{
    ms.Position = 0;
    var bitmap = new BitmapImage();

    bitmap.BeginInit();

    bitmap.StreamSource = ms;
    bitmap.CacheOption = BitmapCacheOption.OnLoad;

    bitmap.EndInit();
    bitmap.Freeze();

    return bitmap;
}

private static Bitmap MemoryStreamToBitmap(MemoryStream ms)
{
    return new Bitmap(ms);
}

private static MemoryStream BitmapToMemoryStream(Bitmap image)
{
    var memoryStream = new MemoryStream();
    image.Save(memoryStream, ImageFormat.Bmp);

    return memoryStream;
}

In retrospect, a couple more questions:

  1. Am I using AForge correctly?
  2. Is AForge the best library to use for this task?
  3. How could my current approach to this be improved to get more accurate results?
like image 967
JMK Avatar asked Jan 04 '13 11:01

JMK


People also ask

What means deskew?

(transitive, computing) To rotate a scanned image to compensate for skewing.


2 Answers

Given the sample input, it is clear that you are not after image deskewing. This kind of operation is not going to correct the distortion you have, instead you need to perform a perspective transform. This can be clearly seen in the following figure. The four white rectangles represent the edges of your four black boxes, the yellow lines are the result of the connecting the black boxes. The yellow quadrilateral is not a skewed red one (the one you want to achieve).

enter image description here

So, if you can actually get the above figure, the problem gets a lot simpler. If you did not have the four corner boxes, you would need other four reference points, so they do help you a lot. After you get the image above, you know the four yellow corners, and then you just map them to the four red corners. This is the perspective transform you need to do, and according to your library there might be a ready function for that (there is one at least, check the comments to your question).

There are multiple ways to get to the image above, so I will simply describe a relatively simple one. First, binarize your grayscale image. To do that, I picked a simple global threshold of 100 (your image is in the range [0, 255]), which keeps the boxes and other details in the image (like the strong lines around the image). Intensities above or equal to 100 are set to 255, and below 100 is set to 0. But, since this is a printed image, how dark the boxes appear are very likely to vary. So you might need a better method here, something as simple as a morphological gradient could work potentially better. The second step is to eliminate irrelevant detail. To do that, perform a morphological closing with a 7x7 square (about 1% of the minimum between the width and the height of the input image). To get the border of the boxes, use a morphological erosion as in current_image - erosion(current_image) using an elementary 3x3 square. Now you have an image with the four white contours as above (this is assuming everything but the boxes were eliminated, a simplification of the other inputs I believe). To get the pixels of these white contours, you can do connected component labeling. With these 4 components, determine the top right one, top left one, bottom right one, and bottom left one. Now you can easily find the needed points to get the corners of the yellow rectangle. All these operations are readily available in AForge, so it is only a matter of translation the following code to C#:

import sys
import numpy
from PIL import Image, ImageOps, ImageDraw
from scipy.ndimage import morphology, label

# Read input image and convert to grayscale (if it is not yet).
orig = Image.open(sys.argv[1])
img = ImageOps.grayscale(orig)

# Convert PIL image to numpy array (minor implementation detail).
im = numpy.array(img)

# Binarize.
im[im < 100] = 0
im[im >= 100] = 255

# Eliminate undesidered details.
im = morphology.grey_closing(im, (7, 7))

# Border of boxes.
im = im - morphology.grey_erosion(im, (3, 3))

# Find the boxes by labeling them as connected components.
lbl, amount = label(im)
box = []
for i in range(1, amount + 1):
    py, px = numpy.nonzero(lbl == i) # Points in this connected component.
    # Corners of the boxes.
    box.append((px.min(), px.max(), py.min(), py.max()))
box = sorted(box)
# Now the first two elements in the box list contains the
# two left-most boxes, and the other two are the right-most
# boxes. It remains to stablish which ones are at top,
# and which at bottom.
top = []
bottom = []
for index in [0, 2]:
    if box[index][2] > box[index+1][2]:
        top.append(box[index + 1])
        bottom.append(box[index])
    else:
        top.append(box[index])
        bottom.append(box[index + 1])

# Pick the top left corner, top right corner,
# bottom right corner, and bottom left corner.
reference_corners = [
        (top[0][0], top[0][2]), (top[1][1], top[1][2]),
        (bottom[1][1], bottom[1][3]), (bottom[0][0], bottom[0][3])]

# Convert the image back to PIL (minor implementation detail).
img = Image.fromarray(im)
# Draw lines connecting the reference_corners for visualization purposes.
visual = img.convert('RGB')
draw = ImageDraw.Draw(visual)
draw.line(reference_corners + [reference_corners[0]], fill='yellow')
visual.save(sys.argv[2])

# Map the current quadrilateral to an axis-aligned rectangle.
min_x = min(x for x, y in reference_corners)
max_x = max(x for x, y in reference_corners)
min_y = min(y for x, y in reference_corners)
max_y = max(y for x, y in reference_corners)

# The red rectangle.
perfect_rect = [(min_x, min_y), (max_x, min_y), (max_x, max_y), (min_x, max_y)]

# Use these points to do the perspective transform.
print reference_corners
print perfect_rect

The final output of the code above with your input image is:

[(55, 30), (734, 26), (747, 1045), (41, 1036)]
[(41, 26), (747, 26), (747, 1045), (41, 1045)]

The first list of points describes the four corners of the yellow rectangle, and the second one is related to the red rectangle. To do the perspective transform, you can use AForge with the ready function. I used ImageMagick for simplicity as in:

convert input.png -distort Perspective "55,30,41,26 734,26,747,26 747,1045,747,1045 41,1036,41,1045" result.png

Which gives the alignment you are after (with blue lines found as before to better show the result):

enter image description here

You may notice that the left vertical blue line is not fully straight, in fact the two left-most boxes are unaligned by 1 pixel in the x axis. This may be corrected by a different interpolation used during the perspective transform.

like image 111
mmgp Avatar answered Sep 20 '22 12:09

mmgp


John the Leptonica library is meant to be very fast and stable.
Here is a link on how to call it from c# http://www.leptonica.com/vs2008doc/csharp-and-leptonlib.html. I'm not sure if this is the answer so I've just added as a comment.

It has a LeptonicaCLR.Utils.DeskewBinaryImage() to actually deskew a b&w image.

I'm not sure how good it would be with the actually forms you are trying to process.

like image 44
DermFrench Avatar answered Sep 20 '22 12:09

DermFrench