Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PDFBox 1.8.10: Fill and Sign PDF produces invalid signatures

I fill (programatically) a form (AcroPdf) in a PDF document and sign the document afterwards. I start with doc.pdf, create doc_filled.pdf, using the setFields.java example of PDFBox. Then I sign doc_filled.pdf, creating doc?filled_signed.pdf, using some code, based on the signature examples and open the pdf in the Acrobat Reader. The entered Field data is visible and the signature panel tells me

"There are errors in the formatting or information contained in this signature (The signature byte array is invalid)"

So far, I know that:

  • the signature code applied alone (i.e. directly creating some doc_signed.pdf) creates a valid signature
  • the problem exists for "invisible signatures", visible signatures and visible signatures, being added to existing signature fields.
  • the problem even occurs, if I do not fill the form, but only open it and save it, i.e.:

    PDDocument doc = PDDocument.load(new File("doc.pdf"));
    doc.save(new File("doc_filled.pdf"));
    doc.close();
    

suffices to break the afterwards applied signing code.

On the other hand, if I take the same doc.pdf, enter the field's values manually in Adobe, the signing code produces valid signatures.

What am I doing wrong?

Update:

@mkl asked me to provide the files, i am talking about (I do not have enough reputation currently, to post all files as links, sorry for that inconvenience):

  • odc.pdf: https://www.dropbox.com/s/ev8x9q48w5l0hof/doc.pdf?dl=0
  • doc_filled.pdf: https://www.dropbox.com/s/fxn4gyneizs1zzb/doc_filled.pdf?dl=0
  • doc_filled_signed.pdf: https://www.dropbox.com/s/xm846sj8f9kiga9/doc_filled_signed.pdf?dl=0
  • doc_filled_and_signed.pdf: https://www.dropbox.com/s/5jftje6ke87jedr/doc_filled_and_signed.pdf?dl=0

the last one was created, by signing and filling the document in one go, using

    doc.saveIncremental(); 

As I already wrote in the comment, some

    setNeedToBeUpdate(true);

seems to be missing, though. With reference to @mkl 's second comment, I found this SO question: Saved Text Field value is not displayed properly in PDF generated using PDFBOX, which also covers to some entered text not being show. I gave it a first try, applying

    setBoolean(COSName.getPDFName("NeedAppearances"), true); 

to the field's and form's dictionary, which then shows the fields context, but the signature does not get added in the end. Still I have to look further into that.

Update: The story continues here: PDFBox 1.8.10: Fill and Sign Document, Filling again fails

like image 482
Daniel Heldt Avatar asked Oct 01 '15 09:10

Daniel Heldt


1 Answers

The cause of the OP's original problem, i.e. that after loading his PDF (for form fill-in) with PDFBox and then saving it, this new PDF cannot be successfully signed using PDFBox signing code, has already been explained in detail in this answer, in short:

  • When saving documents regularly, PDFBox does so using a cross reference table.

    • If the document to save regularly had been loaded from a PDF with a cross reference stream, all entries of the cross reference stream dictionary are saved in the trailer dictionary.
  • When saving documents in the process of applying a signature, PDFBox creates an incremental update; as such incremental updates require that the update uses the same kind of cross reference as the original revision, PDFBox in this case tries to use the same technique.

    • For recognizing the technique originally used PDFBox looks at the Type entry of the dictionary in its document representation into which trailer or cross reference stream dictionary had been loaded: If there is a Type entry with value XRef (which is so specified for cross reference streams), a stream is assumed, otherwise a table.

Thus, in the case of the OP's original PDF doc.pdf which has a cross reference stream:

  • After loading and form fill-in the document is saved regularly, i.e. using a cross reference table, but all the former cross reference stream entries, among them the Type, are copied to the trailer. (doc_filled.pdf)

  • After loading this saved PDF with a cross reference table for signing, it is saved again using an incremental update. PDFBox assumes (due to the Type trailer entry) that the existing file has a cross reference stream and, therefore, uses a cross reference stream at the end of the incremental update, too. (doc_filled_signed.pdf)

  • Thus, in the end the filled-in, then signed PDF has two revisions, the inner one with a cross reference table, the outer one with a cross reference stream.

  • As this is not valid, Adobe Reader upon loading the PDF, repairs this in its internal document representation. Repairing changes the document bytes. Thus, the signature in Adobe Reader's eyes is broken.

  • Most other signature validators don't attempt such repairs but check the signature of the document as is. They validate the signature successfully.

The answer referenced above also offers some ways around this:

  • A: After loading the PDF for form fill-in, remove the Type entry from the trailer before saving regularly. If signing is applied to this file, PDFBox will assume a cross reference table (because the misleading Type entry is not there. Thus, the signature incremental update will be valid.

  • B: Use an incremental update for saving the form fill-in changes, too, either in a separate run or in the same run as signing. This also results in a valid incremental update.

Generally I would propose the latter option because the former option likely will break if the PDFBox saving routines ever are made compatible with each other.

Unfortunately, though, the latter option requires marking the added and changed objects as updated, including a path from the document catalog. If this is not possible or at least too cumbersome, the first option might be preferable.


In the case at hand the OP tried the latter option (doc_filled_and_signed.pdf):

At the Moment the text box's content is only visible, when the text box is selected (with Acrobat reader and Preview the same behaviour). I flag the PDField, all of its parents, the AcroForm, the Catalog as well as the page where it is displayed.

He marked the changed field as updated but not the associated appearance stream which automatically is generated by PDFBox when setting the form field value.

Thus, in the result PDF file the field has the new value but the old, empty appearance stream. Only when clicking into the field, Adobe Reader creates a new appearance based on the value for editing.

Thus, the OP also has to mark the new normal appearance stream (the form field dictionary contains an entry AP referencing a dictionary in which N references the normal appearance stream). Alternatively (if finding the changed or added entries becomes too cumbersome) he might try the other option.

like image 115
mkl Avatar answered Oct 22 '22 11:10

mkl