Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save tiff CCITTFaxDecode (from PDF page) using iText and Java

I'm using iText to extract embedded images and save them as separate files. The .jpg and .png files come out ok, but I cannot extract tiff images that have the CCITTFaxDecode encoding.

Does anyone have a way of saving the tiff files?

I found some sample C# code that uses iTextSharp at Extracting image from PDF with /CCITTFaxDecode filter It indicates a separate tiff library is needed to write out the results. According to that article, the "CCITTFaxDecode" compression is Compression.CCITTFAX4 for the tiff library.

To use that article's method, I need: 1. get a tiff library. The Java Image I/O API will allow you to read and write TIFF files among other formats. BufferedImage image = ImageIO.read( new File( "image.tif" ) );

  1. Find out the equivalent of the code for getting the bitmap's property from the PDF, example: pd.Get(PdfName.WIDTH).ToString() (which is in C#)
like image 966
Mary Avatar asked Oct 11 '22 09:10

Mary


1 Answers

I extracted a tiff image from scanned pdf (that is the every page as image) in the following way:

...
PdfReader reader = new PdfReader("source.pdf");
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener("destination.jpg");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
 }
...

code of MyImageRenderListener.class:

class MyImageRenderListener implements RenderListener {
    protected String path = "";

    public MyImageRenderListener(String path) {
        this.path = path;
    }

    public void beginTextBlock() {
    }

    public void endTextBlock() {
    }

    public void renderImage(ImageRenderInfo renderInfo) {
        try {
            String filename;
            FileOutputStream os;
            PdfImageObject image = renderInfo.getImage();
            PdfName filter = (PdfName) image.get(PdfName.FILTER);

                   if (PdfName.CCITTFAXDECODE.equals(filter)) {
                      BufferedImage bufferedImage = image.getBufferedImage();
                  ImageIO.write(bufferedImage, "jpg", new FileOutputStream(filename));// save tif image as jpg


            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public void renderText(TextRenderInfo renderInfo) {
    }
}
like image 178
Mihai Avatar answered Oct 14 '22 02:10

Mihai