Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does the %%EOF in a PDF have to appear within the last 1024 bytes of the file?

Tags:

pdf

According to the QPDF source I was reading it had this quote about pdfs:

// PDF spec says %%EOF must be found within the last 1024 bytes of
// the file.  We add an extra 30 characters to leave room for the
// startxref stuff.

However, I cannot find any information regarding this in the PDF 1.7 specification. I found a couple places on the internet that also mentioned this though.

My question is: is this true and if so, where is this specified that %%EOF will be in the last 1024 bytes?

like image 647
Jesse Good Avatar asked Aug 10 '12 06:08

Jesse Good


2 Answers

The source code does indeed say that, in libqpdf/QPDF.cc, but ISO 32000-1:2008 (the PDF 1.7 one) has this to say about the file trailer:

7.5.5. File Trailer

The trailer of a PDF file enables a conforming reader to quickly find the cross-reference table and certain special objects. Conforming readers should read a PDF file from its end. The last line of the file shall contain only the end-of-file marker, %%EOF.

So, if you're following the standard, it's even more restrictive than you state.


Back in the Adobe 1.3 specification, in Appendix H (Implementation notes), you'll find this little snippet about the properties of the Acrobat viewer (not the file format):

3.4.4, “File Trailer”

Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file.

In other words, it's saying that the viewer (Adobe's implementation) is a little more relaxed in what it will accept. The specification itself, however, still maintains that the %%EOF has to be on its own, on the last line.

That note still exists in Adobe's version of the file format document up to 1.7. However, it was removed from the ISO version since, rightly so, ISO don't care one little bit about specific implementations of a product, as long as they conform to the standard as written.

Adobe's documents can be found here, they also have the right to distribute a (slightly modified) version of the ISO 32000 standard here.

like image 174
paxdiablo Avatar answered Sep 20 '22 16:09

paxdiablo


You should also be aware of a (standard) feature that can be used by PDF documents: it is called incremental update.

If a document has been incrementally updated, a new modified version of it can be created by keeping the original data (including the last %%EOF line) and appending any changed or added objects behind that, complemented at the new file end with additional xref and trailer sections plus an additional final %%EOF.

It is possible that there are multiple incremental updates to a PDF.

This way the first %%EOF can appear well before "the last 1024 bytes of the file".

The advantage (or disadvantage -- depends on your specific point of view) of this "incremental update" feature is: you can restore the previous version of the PDF file by simply deleting all lines which follow the second-but-last %%EOF (an you can continue that process until you've arrived at the first file version).

There is also a command line tool called pdfresurrect

  • which can report the number of incremental updates which have been applied to a PDF,
  • which can extract previous versions, and
  • which can "flatten" the history and create a new PDF which only contains the last version.

Is this 'incremental update' feature used a lot within real world PDFs?

First : it is used whenever there is a digital/electronic signature applied to a PDF.

Second : it is the standard way for Adobe Acrobat to save a PDF file whenever you simply click on the Save button. (If you want to avoid incrementally updating the document, use Save as... instead!) One of the few exceptions when a simple Save click will no longer incrementally update the file with recent versions of Acrobat, but will generate a completely new PDF is after you deleted complete pages (seems like too many Adobe customers were complaining about previous versions, because any incremental update will increase the file size -- too many were annoyed that deleting pages gave them larger PDFs, and hadn't really deleted the pages either).

So beware of information leaks happening inadvertedly and accidentally, because you are not aware of the Acrobat behaviour outlined in the second point above.


Update

I've recently created a hand-coded PDF file for a PDF workshop (video) at the TROOPERS15 conference, which can be used to study the details of this feature:

  • 114_incrementally-updated.pdf (8.3 kB on GitHub)
    (I'd recommend to make a backup copy of the file after downloading it. Then simply remove every line after the first %%EOF, save the file and look at the now visible content...)
like image 28
Kurt Pfeifle Avatar answered Sep 22 '22 16:09

Kurt Pfeifle