I'm trying to figure out where in an uncompressed PDF v1.4 document the Times font is used.
The /Font
object describing the Times font within the PDF is object 65
as follows:
65 0 obj
<</Type /Font
/Subtype /TrueType
/BaseFont /PXAAAD+TimesNewRoman,Italic
/FirstChar 1
/LastChar 35
/Widths [250 333 333 333 500 500 500 500 500 500 500 500 500 500 333 722 722 833 666 610 500 556 500 443 443 500 277 443 500 389 389 277 500 443 500]
/FontDescriptor 205 0 R
/ToUnicode 206 0 R>>
endobj
It refers to a /FontDescriptor
object 205
to further define the Times font object, and to a /ToUnicode
map in object 206
which describes byte-to-unicode character mapping. EDIT: After Ritsaert's initial answer to the question below, I'm adding the font's /ToUnicode
object here, to provide the mentioned CMap
.
206 0 obj
<</Length 208 0 R>>
stream
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS)
/Supplement 0
>> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<00> <FF>
endcodespacerange
35 beginbfchar
<01> <0020>
<02> <0028>
<03> <0029>
<04> <002d>
<05> <0030>
<06> <0031>
<07> <0032>
...
<23> <0101>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
endstream
endobj
I've now tracked down the use of the Times font object to a /Page
object (one of many) like the following one which refers to font object 65
through the /F4
reference in its page /Resources
:
12 0 obj
<</Type /Page
/Parent 2 0 R
/MediaBox [0 0 432 648]
/Contents 92 0 R
/Resources <</Font <</F1 62 0 R
/F3 64 0 R
/F4 65 0 R>>
/ProcSet [/PDF /Text]>>
/Group <</S /Transparency
/CS /DeviceRGB>>>>
endobj
The /Contents
stream (object 92
in the PDF file) is then full of text objects (enclosed in BT
and ET
), none of which contains text, but instead they use angle brackets full of numbers. For example, here is the only reference to the Times font /F4
whose use I'm trying to find:
92 0 obj
<</Length 93 0 R>>
stream
...
BT
0.5020 g
72.0000 615.1512 Td
/F4 12.0000 Tf
<0605> Tj
ET
...
endstream
endobj
But what do the angle brackets and the number <0605>
refer to? A specific glyph in the font table? Looking at the PDF reference and section 5.3.2 I can't find mention of the angle brackets.
EDIT: Given the above code and the accepted answer that <0605>
is a hex encoding of text, the <0605>
are the entries <06>
and <05>
in the CMap
object 206
and thus map to unicodes <0031>
and <0030>
respectively. That means, the string <0605>
refers to U+0031 (a "1") and to U+0030 (a "0"), such that the Times font is used for the string "10" on page object 12
.
What is going on here:
in the content stream the Tj
command is given the string <0605>
to draw. a string in between <>
is a hex string and hence the characters #6 and #5 are drawn. In 3.2.3 of the linked PDF reference is the notation explained.
Just before the text draw command the font F4
is selected using the Tf
command.
Given the resource fork of the page containing the font is referenced as object 65 revision 0. This font object is a subsetted Truetype font where glyphs 1..35 are defined. No Encoding
is specified (thus WinAnsiEncoding
is used). So the embedded subsetted font rearranged the characters in the font in a non standard manner (occurs quite often).
Now if you want to know how these glyph IDs are linked to Unicode characters: the font has a ToUnicode
link where a stream contains a CMAP
defining the mapping. This should be sufficient to convert the string to an Unicode string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With