I'm trying to read the xref table of a pdf version >= 1.5.
the xref table is an object:
58 0 obj <</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/ID[<CB05990F613E2FCB6120F059A2BCA25B><E2ED9D17A60FB145B03010B70517FC30>]/Index[38 39]/Info 37 0 R/Length 96/Prev 67529/Root 39 0 R/Size 77/Type/XRef/W[1 2 1]>>stream hÞbbd``b`:$AD`Ì ‰Õ Vˆ8âXAÄ×HÈ$€t¨ – ÁwHp·‚ŒZ$ìÄb!&F† .#5‰ÿŒ>(more here but can't paste) endstream endobj
as you can see
BUT :
the decompressed stream is 195 bytes long (39 * 5 = 195). So the length of an entry is 4 or 5.
Here is the first inflated bytes
02 01 00 10 00 02 00 02 cd 00 02 00 01 51 00 02 00 01 70 00 02 00 05 7a 00 02 ^^
if entry length is 4 then the root entry is a free object (see the ^^) !!
if the entry is 5: how to interpret the fields of one entry (reference is implicitly made to PDF Reference, chapter 3.4.7 table 3.16 ) ?
For object 38, the first of the stream: it seems, as it is of type 2, to be the 16 object of the stream object number 256, but there is no object 256 in my pdf file !!!
The question is: how shall I handle the 195 bytes ?
A compressed xref table may have been compressed with one of the PNG filters. If the /Predictor
value is set to '10' or greater ("a Predictor value greater than or equal to 10 merely indicates that a PNG predictor is in use; the specific predictor function used is explicitly encoded in the incoming data")1, PNG row filters are supplied inside the compressed data "as usual" (i.e., in the first byte of each 'row', where the 'row' is of the width in /W
).
Width [1 2 1] plus Predictor byte:
02 01 00 10 00
02 00 02 cd 00
02 00 01 51 00
02 00 01 70 00
02 00 05 7a 00
02 .. .. .. ..
After applying the row filters ('2', or 'up', for all of these rows), you get this:
01 00 10 00
01 02 ed 00
01 03 3e 00
01 04 ae 00
01 09 28 00
.. .. .. ..
Note: calculated by hand; I might have made the odd mistake here and there. Note that the PNG 'up' filter is a byte filter, and the result of the "up" filter is truncated to 8 bits for each addition.
This leads to the following Type 1 XRef references ("type 1 entries define objects that are in use but are not compressed (corresponding to n entries in a cross-reference table)."):2
#38 type 1: offset 10h, generation 0
#39 type 1: offset 2EDh, generation 0
#40 type 1: offset 33Eh, generation 0
#41 type 1: offset 4AEh, generation 0
#42 type 1: offset 928h, generation 0
1 See LZW and Flate Predictor Functions in PDF Reference 1.7, 6th Ed, Section 3.3: Filters.
2 As described in your Table 3.16 in PDF Ref 1.7.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With