I'm currently working on a tool for applying and verifying digital signatures on pdf documents. I understand that each signature in a pdf document is applied to a certain revision of the AcroFields. Each time the user changes some input (i.e. by filling out a pdf form), a new revision is created.
My question is: how can I retrieve all revisions from an AcroFields object? I know how to get a single revision if I have its name like this:
AcroFields fields = ...;
fields.getRevision("revision1");
But how can I retrieve all revisions (or their names, at least)? I haven't found anything in the iText API and the web so far.
I'm using iText in version 2.1.7.
Thanks and Regards, Tom
The AcroFields are read from the document and saved to the database using a parameterized query, AcroFields are across the whole document and are not referenced per page. The following field data is saved in the database if the field has data: File name, Field name, Field value, Field type, File type.
In contrast to your assumption, these revisions do not have a name per se. Unless you use the second part of the ID (which should be different for different revisions) that is, but AFAIK that is not used as a name for anything in iText. There is some imprecision in the exact point where a trailer stops and the next body update starts.
These states are commonly named revisions of the document, and each revision of the document obviously reflects some state of the form information of the PDF which I assume is what you call revision of the AcroFields.
“ Number: (?<one>.+)Rev: ” – find characters between Number: and Rev: and assign to group “one” must be re-written as - “ Number: (?<one>>.+)Rev: ” The AcroFields are read from the document and saved to the database using a parameterized query, AcroFields are across the whole document and are not referenced per page.
Some backgrounds first
Normally a PDF file consists of a header, a body, cross reference information, and a trailer, see figure 2 below. When updating such a PDF file, you have the choice of
Actually there are some forms in-between, though. E.g. some tools merely cut off the cross references and the trailer of the original document and then add their new or changed body elements, new complete cross references, and a new trailer without any back reference to some former state.
(Images copied from the PDF specification ISO 32000-1:2008)
In case of PDFs formed like in figure 3, we have a history of different states of the PDF at hand, each starting at the start of the file and reaching up to and including one of the trailers. These states are commonly named revisions of the document, and each revision of the document obviously reflects some state of the form information of the PDF which I assume is what you call revision of the AcroFields.
In contrast to your assumption, these revisions do not have a name per se. Unless you use the second part of the ID (which should be different for different revisions) that is, but AFAIK that is not used as a name for anything in iText.
There is some imprecision in the exact point where a trailer stops and the next body update starts. On one hand there are some choices imminent in the specification (different possible line breaks, ignored white space, ignored comment lines) and on the other hand many PDF producers are a bit beyond the specification anyways. This in combination with the in-between varieties between full and incremental updates mentioned above can make the process of extracting the revisions somewhat troublesome sometimes.
There is a special case of revisions which can be recognized with a great degree of reliability: Signed revisions, i.e. revisions whose last body update contains an integrated signature for the document. As the signed byte ranges of a document must encompass all of the document revision but the gap left for the signature itself (at least to be accepted by Adobe software and to be conform to PAdES and PDF-2 standards), the exact end of the revision in that case can be deduced from the signature information:
More details here.
Some answers to your questions
I understand that each signature in a pdf document is applied to a certain revision of the AcroFields.
As mentioned above, each is applied to a certain revision of the document which then implies a certain state or "revision" of the form data.
Each time the user changes some input (i.e. by filling out a pdf form), a new revision is created.
Not necessarily. As mentioned above there are many in-between methods for updates.
Only if changing the information of a document whose latest revision is signed, a proper incremental update is necessary if that signature is not to be removed or invalidated. Otherwise an updater can take all information added after the last signature, create an own update with any content he wishes, and append that update to the last signed revision of the document. This update may even contain multiple virtual update blocks with the intention of making you believe certain intermediate revisions actually existed.
Thus, only signed revisions can be somehow trusted to be true. iText only provides access to such signed revisions.
My question is: how can I retrieve all revisions from an AcroFields object?
You can extract all the signed document revisions using
InputStream revisionStream = fields.extractRevision("name");
and open them in separate PdfReader
instances. Then you can access the PDF form information of each of these signed revisions by querying the AcroFields
instance of the respective PdfReader
opened for that revision.
(BTW, the String
argument is not the name of the revision but the name of the signature field whose signature signs that revision.)
But how can I retrieve all revisions (or their names, at least)? I haven't found anything in the iText API and the web so far.
As mentioned before, those revision names actually are signature field names. Thus, you can use
List<String> names = fields.getSignatureNames()
to retrieve all names for which a revision can be extracted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With