Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimal PDF example in PDF specification

I took the minimal PDF example in the PDF specification from PDF Specification, copied it to NotePad, renamed the file to have the extension .pdf.

I can open it with other PDF viewer (PDF-XChange, SumatraPDF, MuPDF). But when I open it with Adobe Reader, it says the file is broken.

I am not sure if other viewers treat this "broken" file as blank file or not.

The file is supposed to display one blank page, since it is a minimal example.

In fact, I modify the minimal example. Because when I copy it from PDF specification to notepad, and open the .txt file by a Hex Editor, I see a new line in .txt file give me 2 space. For example,

1 0 obj
<< /Type /Catalog

gives me (in Hex Editor)

1 0 obj  << /Type /Catalog

which is (in hex values)

31 20 30 20 6F 62 6A 0D 0A 3C 3C 20 2F 54 79 70
65 20 2F 43 61 74 61 6C 6F 67

The 2 spaces between j and < are 0D 0A.

Hence I don't make new lines in NotePad, and modify the values in the xref part.

Below is the full code.

Do you know what's wrong with this example? Why does Adobe Reader say it is broken? Is this because I gave the wrong values in xref?

%PDF-1.4 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [4 0 R] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 0 R /Resources << /ProcSet 6 0 R >> >> endobj 5 0 obj << /Length 35 >> stream … Page-marking operators … endstream endobj 6 0 obj [/PDF] endobj xref 0 7 0000000000 65535 f 0000000009 00000 n 0000000074 00000 n 0000000119 00000 n 0000000176 00000 n 0000000295 00000 n 0000000373 00000 n trailer << /Size 7 /Root 1 0 R >> startxref 395 %%EOF
like image 409
user565739 Avatar asked Sep 30 '12 15:09

user565739


1 Answers

First: when you 'copied' the example from the PDF specification, very likely a few things happened which made your copy to not work as expected:

  • ...you didn't 'copy' by re-typing the example in a text editor, but
  • ...you used copy'n'paste, using a PDF as the source file.

Depending on your text editor, that method probably caused the conversion of the newline convention to be changed from [cr]+[lf] to [cr] or vice-versa. This in turn means that the byte offset numbers in the object 'table of contents' (the 'xref'-table) are no longer valid.

Another problem with the PDF source code you posted is that it doesn't now contain any linebreaks at all. Some viewers may be able to still silently parse the thing, but not all are. And it certainly is against the spec, because according to the spec, in chapter 7.5.2 it is clearly spelled out that

"The first line of a PDF file shall be a header consisting of the 5 characters %PDF– followed by a version number of the form 1.N, where N is a digit between 0 and 7.

Your header violates that rule.

Also, the 'stream' in 5 0 obj isn't any valid PDF code, it is just place holder text (… Page-marking operators …). Some viewers may be tilting when they come across such 'garbage'.

Lastly, your startxref value wasn't correct.

So here is a file that works. I repaired it in a text editor, and I put your original code as a comment after the %%EOF for comparison and reference:

%PDF-1.4
1 0 obj
<< /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >>
endobj
2 0 obj
<< /Type Outlines /Count 0 >>
endobj
3 0 obj
<< /Type /Pages /Kids [4 0 R] /Count 1 >>
endobj
4 0 obj
<< /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 0 R /Resources << /ProcSet 6 0 R >> >>
endobj
5 0 obj
<< /Length 35 >>
stream
… Page-marking operators …
endstream 
endobj
6 0 obj
[/PDF]
endobj
xref
0 7
0000000000 65535 f 
0000000009 00000 n 
0000000074 00000 n 
0000000119 00000 n 
0000000176 00000 n 
0000000295 00000 n 
0000000376 00000 n 
trailer 
<< /Size 7 /Root 1 0 R >>
startxref
394
%%EOF

%% %PDF-1.4 1 0 obj << /Type /Catalog /Outlines 2 0 R /Pages 3 0 R >> endobj 2 0 obj << /Type Outlines /Count 0 >> endobj 3 0 obj << /Type /Pages /Kids [4 0 R] /Count 1 >> endobj 4 0 obj << /Type /Page /Parent 3 0 R /MediaBox [0 0 612 792] /Contents 5 0 R /Resources << /ProcSet 6 0 R >> >> endobj 5 0 obj << /Length 35 >> stream … Page-marking operators … endstream endobj 6 0 obj [/PDF] endobj xref 0 7 0000000000 65535 f 0000000009 00000 n 0000000074 00000 n 0000000119 00000 n 0000000176 00000 n 0000000295 00000 n 0000000373 00000 n trailer << /Size 7 /Root 1 0 R >> startxref 395
like image 138
Kurt Pfeifle Avatar answered Sep 27 '22 20:09

Kurt Pfeifle