Here is the challenge I'm currently facing.
I have a lot of PDFs and I have to remove the blank pages inside them and display only the pages with content (text or images).
The problem is that those pdfs are scanned documents.
So the blank pages have some dirty left behind by the scanner.
I did some research and ended up with this code that checks for 99% of the page as white or light gray. I needed the gray factor as the scanned documents sometimes are not pure white.
private static Boolean isBlank(PDPage pdfPage) throws IOException {
BufferedImage bufferedImage = pdfPage.convertToImage();
long count = 0;
int height = bufferedImage.getHeight();
int width = bufferedImage.getWidth();
Double areaFactor = (width * height) * 0.99;
for (int x = 0; x < width ; x++) {
for (int y = 0; y < height ; y++) {
Color c = new Color(bufferedImage.getRGB(x, y));
// verify light gray and white
if (c.getRed() == c.getGreen() && c.getRed() == c.getBlue()
&& c.getRed() >= 248) {
count++;
}
}
}
if (count >= areaFactor) {
return true;
}
return false;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With