Does anybody have a suggestion for a java library that performs automatic cropping and deskewing of images (like those retrieved from a flatbed scanner)?
Deskewing
Take a look at Tess4j (Java JNA wrapper for Tesseract).
You can combine ImageDeskew.getSkewAngle() with ImageHelper.rotate(BufferedImage image, double angle).
There is an example on how to use it on the test folder of the tess4j project Tesseract1Test.java
public void testDoOCR_SkewedImage() throws Exception {
logger.info("doOCR on a skewed PNG image");
File imageFile = new File(this.testResourcesDataPath, "eurotext_deskew.png");
BufferedImage bi = ImageIO.read(imageFile);
ImageDeskew id = new ImageDeskew(bi);
double imageSkewAngle = id.getSkewAngle(); // determine skew angle
if ((imageSkewAngle > MINIMUM_DESKEW_THRESHOLD || imageSkewAngle < -(MINIMUM_DESKEW_THRESHOLD))) {
bi = ImageHelper.rotateImage(bi, -imageSkewAngle); // deskew image
}
String expResult = "The (quick) [brown] {fox} jumps!\nOver the $43,456.78 <lazy> #90 dog";
String result = instance.doOCR(bi);
logger.info(result);
assertEquals(expResult, result.substring(0, expResult.length()));
}
ImageMagick can do that; you can use the ImageMagick Java bindings. The auto-crop operator is probably what you're looking for. Automatic deskewing is a much harder problem and involves some significant image processing; I'm not sure if ImageMagick can handle that. If you can figure out the skewing parameters using something else, ImageMagick can definitely unskew it for you.
I wrote a note that simple port of a very good deskewer. It works best if you have some text in the image.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With