Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The 14 standard PDF fonts and character encoding

I'm having difficulty producing PDFs that make use of the 14 standard PDF fonts. Let's use Times-Roman as an example.

I create a Font dictionary of type Type1, with BaseFont set to Times-Roman. If I omit the Encoding entry to the Font dictionary, or add an Encoding dictionary without a BaseEncoding set, the PDF viewer application should use the font's built-in encoding. For Times-Roman, this is AdobeStandardEncoding.

This works fine for ASCII characters. However, something more exotic like the 'fi' ligature (AdobeStandardEncoding code 174) is not displayed correctly by all PDF viewers:

  • Adobe Reader shows ® (unicode index 174) for Times-Roman and Ă for Times-Italic
  • SumatraPDF (wine) shows ® for both fonts
  • Mozilla's PDF.js shows the 'AE' ligature both fonts

All other PDF viewers I've tried, display the 'fi' ligature properly. They also display the € symbol correctly, which is additionally mapped using the Differences array in the Encoding dictionary (because it is not included in AdobeStandardEncoding):

  • Apple Preview/Skim
  • GhostScript
  • PDF-XChange Viewer (wine)
  • Foxit Reader (wine)
  • Chromium's internal PDF viewer
  • Evince (homebrew)

Opening Adobe Reader's Document Properties window shows:

Times-Roman
    Type: Type1
    Encoding: Custom
    Actual Font: Times-Roman
    Actual Font Type: TrueType

I suspect the fact that a TrueType font is being used instead of a Type1 font might be related to the problem. The PDF specification:

StandardEncoding Adobe standard Latin-text encoding. This is the built-in encoding defined in Type 1 Latin-text font programs (but generally not in TrueType font programs).

It also says WinAnsiEncoding and MacRomanEncoding can be used with TrueType fonts. So should we avoid using the built-in or StandardEncoding for the standard 14 fonts? Its effects seem to be undefined. It seems Adobe Reader doesn't bother performing a proper mapping from glyph names to glyphs in the TrueType font being used.

Will providing a Differences array when using the Win or Mac encodings produce proper results? Since these map codepoints to Type1/Postscript glyphs names, there is no direct link to TrueType glyphs.

EDIT Mmm, I have a feeling the Font Descriptor Flags might be important for these standard fonts. I set the flags to 4 up to now for all fonts, which seemed to work fine for True/OpenType fonts.

like image 365
Brecht Machiels Avatar asked Apr 07 '16 20:04

Brecht Machiels


1 Answers

Turns out the Flags in the FontDescriptor dictionary is important. For Times, the Nonsymbolic flag (bit 6) needs to be set. The fact that Times is actually being typeset using a TrueType font has nothing to do with it.

To use the built-in encoding of the font, the Encoding entry of the Type1 Font dictionary should not be set. You may only add an Encoding dictionary (with BaseEncoding omitted) if it contains a non-empty Differences array, or Adobe Reader will error.

With these precautions, the generated PDF displays correctly on all 9 viewer applications listed above.

like image 150
Brecht Machiels Avatar answered Oct 23 '22 04:10

Brecht Machiels