How to get the names of AcroField revisions in iText?

Tags:

I'm currently working on a tool for applying and verifying digital signatures on pdf documents. I understand that each signature in a pdf document is applied to a certain revision of the AcroFields. Each time the user changes some input (i.e. by filling out a pdf form), a new revision is created.

My question is: how can I retrieve all revisions from an AcroFields object? I know how to get a single revision if I have its name like this:

AcroFields fields = ...;
fields.getRevision("revision1");

But how can I retrieve all revisions (or their names, at least)? I haven't found anything in the iText API and the web so far.

I'm using iText in version 2.1.7.

Thanks and Regards, Tom

851

asked Jun 18 '13 12:06

Tom

1 Answers

Some backgrounds first

Normally a PDF file consists of a header, a body, cross reference information, and a trailer, see figure 2 below. When updating such a PDF file, you have the choice of

either building the whole document anew with all the changes integrated (which results in a PDF again formed like the original one)
or you can append the changes of body elements and cross references to the document and add a new trailer also referencing the former trailer (which results in a PDF formed like depicted in figure 3 below).

Actually there are some forms in-between, though. E.g. some tools merely cut off the cross references and the trailer of the original document and then add their new or changed body elements, new complete cross references, and a new trailer without any back reference to some former state.

Initial structure of a PDF file Structure of an updated PDF file

(Images copied from the PDF specification ISO 32000-1:2008)

In case of PDFs formed like in figure 3, we have a history of different states of the PDF at hand, each starting at the start of the file and reaching up to and including one of the trailers. These states are commonly named revisions of the document, and each revision of the document obviously reflects some state of the form information of the PDF which I assume is what you call revision of the AcroFields.

There is some imprecision in the exact point where a trailer stops and the next body update starts. On one hand there are some choices imminent in the specification (different possible line breaks, ignored white space, ignored comment lines) and on the other hand many PDF producers are a bit beyond the specification anyways. This in combination with the in-between varieties between full and incremental updates mentioned above can make the process of extracting the revisions somewhat troublesome sometimes.

There is a special case of revisions which can be recognized with a great degree of reliability: Signed revisions, i.e. revisions whose last body update contains an integrated signature for the document. As the signed byte ranges of a document must encompass all of the document revision but the gap left for the signature itself (at least to be accepted by Adobe software and to be conform to PAdES and PDF-2 standards), the exact end of the revision in that case can be deduced from the signature information:

Multiple integrated signatures

More details here.

Some answers to your questions

I understand that each signature in a pdf document is applied to a certain revision of the AcroFields.

As mentioned above, each is applied to a certain revision of the document which then implies a certain state or "revision" of the form data.

Each time the user changes some input (i.e. by filling out a pdf form), a new revision is created.

Not necessarily. As mentioned above there are many in-between methods for updates.

Only if changing the information of a document whose latest revision is signed, a proper incremental update is necessary if that signature is not to be removed or invalidated. Otherwise an updater can take all information added after the last signature, create an own update with any content he wishes, and append that update to the last signed revision of the document. This update may even contain multiple virtual update blocks with the intention of making you believe certain intermediate revisions actually existed.

Thus, only signed revisions can be somehow trusted to be true. iText only provides access to such signed revisions.

My question is: how can I retrieve all revisions from an AcroFields object?

You can extract all the signed document revisions using

InputStream revisionStream = fields.extractRevision("name");

and open them in separate PdfReader instances. Then you can access the PDF form information of each of these signed revisions by querying the AcroFields instance of the respective PdfReader opened for that revision.

(BTW, the String argument is not the name of the revision but the name of the signature field whose signature signs that revision.)

But how can I retrieve all revisions (or their names, at least)? I haven't found anything in the iText API and the web so far.

As mentioned before, those revision names actually are signature field names. Thus, you can use

List<String> names = fields.getSignatureNames()

to retrieve all names for which a revision can be extracted.

168

answered Dec 13 '22 20:12

mkl

Related questions
                            
                                Saving Word DOCX files as PDF [closed]
                            
                                Converting html to pdf with Arabic
                            
                                How to convert PDF to text file in iTextSharp
                            
                                highlight text using pdfbox when it's location in the pdf is known
                            
                                How to add page numbers to PDF file using python and matplotlib?
                            
                                MigraDoc headers and footers
                            
                                matplotlib show many images in single pdf page
                            
                                In TCPDF, how to get current page number as integer?
                            
                                How to disable the built-in PDF viewer in Opera [closed]
                            
                                PDFBox Error When Converting to BufferedImage: NoClassDefFoundError: org/apache/fontbox/FontBoxFont
                            
                                Laravel PDF: No block-level parent found. Not good
                            
                                Google maps and pdf
                            
                                how to use imageMagick with C#
                            
                                PDFkit doesn't display pictures in PDF
                            
                                How do I design and implement password authentication for IOS?
                            
                                Display Pdf in browser using java servlet
                            
                                .NET C# - MigraDoc - How to change document charset?
                            
                                Not applying the CSS while generating PDF using iTextsharp.dll
                            
                                Use this php code to open a pdf in a new tab
                            
                                MigraDoc C# Align left and right on same line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get the names of AcroField revisions in iText?

Tags:

pdf

signature

itext

acrofields

Tom

People also ask

1 Answers

mkl

Recent Activity

Donate For Us