What is the ID field in a pdf file?

Tags:

2 Answers

Some remarks to add to the picture from @Millie's answer:

When in doubt about some aspects of PDF, the first place to look should be the specification ISO 32000-1.

It specifies the ID entry as:

ID array (Required if an Encrypt entry is present; optional otherwise; PDF 1.1)

An array of two byte-strings constituting a file identifier (see 14.4, "File Identifiers") for the file. If there is an Encrypt entry this array and the two byte-strings shall be direct objects and shall be unencrypted.

NOTE 1 Because the ID entries are not encrypted it is possible to check the ID key to assure that the correct file is being accessed without decrypting the file. The restrictions that the string be a direct object and not be encrypted assure that this is possible.

NOTE 2 Although this entry is optional, its absence might prevent the file from functioning in some workflows that depend on files being uniquely identified.

NOTE 3 The values of the ID strings are used as input to the encryption algorithm. If these strings were indirect, or if the ID array were indirect, these strings would be encrypted when written. This would result in a circular condition for a reader: the ID strings must be decrypted in order to use them to decrypt strings, including the ID strings themselves. The preceding restriction prevents this circular condition.

(Table 15 – Entries in the file trailer dictionary)

NOTE 2 above in essence is a recommendation to add this optional value even though it is not formulated using the SHALL/SHOULD/MAY specification language conventions applied elsewhere in this document.

The recommendation is more explicit in the referenced section 14.4:

The ID entry is optional but should be used.

As should in these specifications denotes a recommendation and a recommendation is defined as something one has to do unless there are good reasons not to, this means a PDF writer has to create this entry unless it can argue against the requirement (I can hardly think of arguments to use against that). This should answer the question asked in response to Millie's answer

any idea why both PdfSharp and phantomjs create it?

Especially it is not just considered good practice as assumed in another comment above.

Concerning the contents of the ID array, the specification continues in section 14.4:

The value of this entry shall be an array of two byte strings. The first byte string shall be a permanent identifier based on the contents of the file at the time it was originally created and shall not change when the file is incrementally updated. The second byte string shall be a changing identifier based on the file’s contents at the time it was last updated. When a file is first written, both identifiers shall be set to the same value. If both identifiers match when a file reference is resolved, it is very likely that the correct and unchanged file has been found. If only the first identifier matches, a different version of the correct file has been found.

To help ensure the uniqueness of file identifiers, they should be computed by means of a message digest algorithm ...

The calculation of the file identifier need not be reproducible; all that matters is that the identifier is likely to be unique.

Thus, the first article Millie quoted from is not entirely correct when it claims

the file identifier (the /ID entry from the trailer dictionary). This is an arbitrary string of bytes

The value of the ID entry is not a string but instead an array of two strings. And the string values are not arbitrary but instead unique values recommended to be obtained by hashing. Thus they especially must not be re-used for different documents which would be ok if they were merely arbitrary.

The other article quoted from also is not entirely correct saying

a program that makes PDF files is only required to create the file identifier if the file is to be encrypted.

Even when not encrypting, that program has to have good reasons not to create file identifiers as it's a recommendation in the specification. Lacking such reasons, therefore, a program is required to create the file identifier.

This all being said, any PDF consumer always has to be prepared to find a PDF without file identifier... there might be a reason for not creating it after all.

165

answered Sep 29 '22 11:09

mkl

According to this article:

4. Append the file identifier (the /ID entry from the trailer
   dictionary).  This is an arbitrary string of bytes; Adobe
   recommends that it be generated by MD5 hashing various pieces
   of information about the document.

That was talking about the encryption of PDFs. According to this article, the ID is only needed during encryption:

a program that makes PDF files is only required to create the file 
identifier if the file is to be encrypted.

This SO link also has some good info. It states that the ID only needs to be reasonably unique, and gives the specific ISO number to find more info.

answered Sep 29 '22 09:09

Millie Smith

Related questions
                            
                                Submit pdf form fields to a HTTP POST request
                            
                                What's the best way to convert a FlowDocument into PDF
                            
                                PHP PDF Generator Advice [closed]
                            
                                How to Get PDF page width and Height?
                            
                                Adding fonts to Apache Pdfbox?
                            
                                Intellij print all classes as PDF
                            
                                Writing Arabic with PDFBOX with correct characters presentation form without being separated
                            
                                Reading PDF from within an Android application [closed]
                            
                                Can Mathematica create multi-page PDF files?
                            
                                PDF Spec vs Acrobat creation (QuadPoints)
                            
                                Convert HTML form data into a PDF file using PHP
                            
                                xtable in .Rmd then knit as pdf in rstudio shows % comments
                            
                                How to convert PDF to CSV with tabula-py?
                            
                                Haskell: parsing PDF
                            
                                FPDF Page Break Question
                            
                                Applying watermarks on pdf files when users try to download the files
                            
                                easiest way to write a title page to pdf without Sweave
                            
                                How to create PDF documents from image files, using PHP
                            
                                HTML2PDF using Google Drive API
                            
                                FileOutputStream equivalent

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the ID field in a pdf file?

Tags:

pdf

George Mauer

People also ask

2 Answers

mkl

Millie Smith

Recent Activity

Donate For Us