According to the PDF 1.7 specification, page 90, Sec 3.4:
The preceding sections describe the syntax of individual objects. This section describes how objects are organized in a PDF file for efficient random access and incremental update. A canonical PDF file initially consists of four elements (see Figure 3.2):
A one-line header identifying the version of the PDF specification to which the file conforms
A body containing the objects that make up the document contained in the file
A cross-reference table containing information about the indirect objects in the file
A trailer giving the location of the cross-reference table and of certain special objects within the body of the file
Basically, the structure has the header, followed by the body content, then the cross reference table, and finally the trailer which gives the location of the xref table. The key part here is that the trailer
and xref
tables are at the end of the file, and the xref
table contains the pertinent metadata of the body content (mainly the 10-digit byte offset).
Given that the xref table itself is located at the very end of a PDF file:
See screenshot of my partially downloaded PDF file:
Try resetting the display preference in your browser to clear up the viewing issue. In Reader or Acrobat, right-click the document window, and choose Page Display Preferences. From the list at left, select Internet. Deselect Display PDF in browser, and then click OK.
The type of PDF files the OP describes is also known as "web optimized" (marketing term) or "linearized" (technical term in PDF parlance).
It has to be noted that it only works if two extra conditions (on top of the linearization feature of the files) are met:
If byte-streaming is not supported by the server or if the PDF file is not linearized, the entire file still needs to be downloaded completely before it the viewer can display any page.
The description about the PDF file structure quoted by the OP does not apply to linearized PDF files. These are organized in a slightly different way:
Regarding the additional structures, a linearized PDF contains its objects in two groups:
In the first group is the document catalogue, all document-level objects, and all objects belonging to the first-to-be-displayed page (not necessarily "page 0"!). The objects shall be numbered sequentially.
The second group holds all the other objects.
These groups shall be indexed by two xref
table sections.
xref
section appears immediately after the first indirect object, very close to the beginning of the file.xref
section is positioned at the end of the file (just as in standard, non-linearized PDFs).The first object immediately after the %PDF-1.x
header line shall contain a dictionary key indicating the /Linearized
property of the file.
This overall structure allows a conforming reader to learn the complete list of object addresses very quickly, without needing to download the complete file from beginning to end:
The viewer can display the first page(s) very fast, before the complete file is downloaded.
The user can click on a thumbnail page preview (or a link in the ToC of the file) in order to jump to, say, page 445, immediately after the first page(s) have been displayed, and the viewer can then request all the objects required for page 445 by asking the remote server via byte range requests to deliver these "out of order" so the viewer can display this page faster. (While the user reads pages out of order, the downloading of the complete document will still go on in the background...)
The technical details of PDF "linearization" can be found in the 'normative' Appendix F of Adobe's original PDF 1.7 Specification (ca. 11 MByte -- which in itself is an example of such a linearized PDF file!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With