Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In PDFBox, why does file size becomes extremely large after saving?

Tags:

java

pdf

pdfbox

Question

I am using PDFBox 1.8.8 to manipulate existing PDF files. After saving a document, the output file becomes several times larger than the original. This is undesirable.

How can I reduce the file size of output files?

How to replicate my situation

In the following code, PDFBox simply loads an existing PDF and then save it. Nothing else is done. Yet the file size still becomes several times larger.

Below are links to two sample input files. For input1.pdf, file size increases from 6MB to 50MB. For input2.pdf, file size increases from 0.4MB to 1.3MB.

https://dl.dropboxusercontent.com/u/13566649/samplePDF/input1.pdf https://dl.dropboxusercontent.com/u/13566649/samplePDF/input2.pdf

import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.exceptions.*;


class Test {

    public static void main(String[] args) throws IOException, COSVisitorException {

        PDDocument document = PDDocument.load("input1.pdf");
        document.save("output.pdf");
        document.close();       
    }
}   

What I have tried

I have tried using addCompression() method of PDStream class, as in the following code. It does not change anything. Output file size is still the same.

class Test2 {

    public static void main(String[] args) throws IOException, COSVisitorException {

        PDDocument document = PDDocument.load("input1.pdf");

        for (int i = 0; i < document.getNumberOfPages(); i++) {
            PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(i);
            page.getContents().addCompression();
        }

        document.save("output.pdf");
        document.close();    

    }

}   
like image 508
Brian Avatar asked Oct 31 '22 09:10

Brian


1 Answers

I wrote this strange code and it works for me (Apache PDFBox v.2.0.8):

private void saveCompressedPDF(PDDocument srcDoc, OutputStream os) throws IOException {
    PDDocument outDoc = new PDDocument();
    outDoc.setDocumentInformation(srcDoc.getDocumentInformation());
    for (PDPage srcPage : srcDoc.getPages()) {
        new PDPageContentStream(outDoc, srcPage,
                PDPageContentStream.AppendMode.APPEND, true).close();
        outDoc.addPage(srcPage);
    }
    outDoc.save(os);
    outDoc.close();
}
like image 176
kinjelom Avatar answered Nov 14 '22 22:11

kinjelom