Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Protecting PDF using PDFBox

Tags:

java

pdf

pdfbox

Im really struggling with the documentation for PDFBox. For such a popular library info seems to be a little thin on the ground (for me!).

Anyway the problem Im having relates to protecting the PDF. At the moment all I want is to control the access permissions of the users. specifically I want to prevent the user from being able to modify the PDF.

If I omit the access permission code everything works perfectly. I am reading in a PDF from an external resource. I am then reading and populating the fields, adding some images before saving the new PDF. That all works perfectly.

The problem comes when I add the following code to manage the access:

/* Secure the PDF so that it cannot be edited */
try {
    String ownerPassword = "DSTE$gewRges43";
    String userPassword = "";

    AccessPermission ap = new AccessPermission();
    ap.setCanModify(false);

    StandardProtectionPolicy spp = new StandardProtectionPolicy(ownerPassword, userPassword, ap);
    pdf.protect(spp);
} catch (BadSecurityHandlerException ex) {
    Logger.getLogger(PDFManager.class.getName()).log(Level.SEVERE, null, ex);
}

When I add this code, all the text and images are striped from the outgoing pdf. The fields are still present in the document but they are all empty and all the text and images that where part of the original PDF and that were added dynamically in the code are gone.

UPDATE: Ok, as best as I can tell the problem is coming from a bug relating to the form fields. I'm going to try a different approach without the form fields and see what it gives.

like image 843
tarka Avatar asked Oct 25 '12 11:10

tarka


1 Answers

I found the solution to this problem. It would appear that if the PDF comes from an external source, sometimes the PDF is protected or encrypted.

If you get a blank output when loading up a PDF document from an external source and add protections you are probably working with an encrypted document. I have a stream processing system working on PDF documents. So the following code works for me. If you are just working with PDF inputs then you could integrate the below code with your flow.

public InputStream convertDocument(InputStream dataStream) throws Exception {
    // just acts as a pass through since already in pdf format
    PipedOutputStream os = new PipedOutputStream();
    PipedInputStream is = new PipedInputStream(os);

    System.setProperty("org.apache.pdfbox.baseParser.pushBackSize", "2024768"); //for large files

    PDDocument doc = PDDocument.load(dataStream, true);

    if (doc.isEncrypted()) { //remove the security before adding protections
        doc.decrypt("");
        doc.setAllSecurityToBeRemoved(true);
    }
    doc.save(os);
    doc.close();
    dataStream.close();
    os.close();
    return is;
}

Now take that returned InputStream and use it for your security application;

   PipedOutputStream os = new PipedOutputStream();
   PipedInputStream is = new PipedInputStream(os);

   System.setProperty("org.apache.pdfbox.baseParser.pushBackSize", "2024768");
   InputStream dataStream = secureData.data();

   PDDocument doc = PDDocument.load(dataStream, true);
   AccessPermission ap = new AccessPermission();
   //add what ever perms you need blah blah...
   ap.setCanModify(false);
   ap.setCanExtractContent(false);
   ap.setCanPrint(false);
   ap.setCanPrintDegraded(false);
   ap.setReadOnly();

   StandardProtectionPolicy spp = new StandardProtectionPolicy(UUID.randomUUID().toString(), "", ap);

   doc.protect(spp);

   doc.save(os);
   doc.close();
   dataStream.close();
   os.close();

Now this should return a proper document with no blank output!

Trick is to remove encryption first!

like image 125
NightWolf Avatar answered Oct 01 '22 18:10

NightWolf