Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The PDF viewer 'Evince' on Linux can not display some math symbols correctly [closed]

Tags:

linux

pdf

fonts

I use Evince to view PDF files on Ubuntu Linux 10.04. But sometimes the program can not display math symbols correctly.

An example. The PDF file can be download from:

  • http://jmlr.csail.mit.edu/papers/volume12/zhang11a/zhang11a.pdf

See equation (1).

On Windows, the equation is correctly displayed by Acrobat Reader: enter image description here But on Linux, the \sum is displayed as a solid dot: enter image description here

I have already installed packages: ttf-symbol-replacement, libpoppler5, poppler-data.

like image 455
Yun Huang Avatar asked Apr 23 '12 08:04

Yun Huang


1 Answers

The reason why your PDF file doesn't work on all PDF viewers is this: your 'zhang11a.pdf' does not have all fonts embedded. Especially missing is the symbol font, as you can see from the following list:

kupfe@abc:~$ pdffonts zhang11a.pdf 
name                             type      emb sub uni object ID
-------------------------------- --------- --- --- --- ---------
NXDEKT+CMSY10                    Type 1C   yes yes yes     11  0
Times-Italic                     Type 1    no  no  no      10  0
Times-Bold                       Type 1    no  no  no       9  0
Times-Roman                      Type 1    no  no  no       8  0
UYBJCW+MSBM10                    Type 1C   yes yes no      29  0
QEAPRL+CMR10                     Type 1C   yes yes no      23  0
OBCIBS+CMMI10                    Type 1C   yes yes yes     25  0
Symbol                           Type 1    no  no  no      33  0
OUPZTL+ZapfChancery-MediumItalic Type 1C   yes yes no      27  0
CFICWF+CMEX10                    Type 1C   yes yes no      31  0
XRVDJC+CMMI7                     Type 1C   yes yes no      56  0
JQSOYL+CMMI10                    Type 1C   yes yes no      54  0
UWKDHL+CMBX10                    Type 1C   yes yes no      58  0
AIYCES+CMMI5                     Type 1C   yes yes no      60  0
SDIKLH+CMEX9                     Type 1C   yes yes no      72  0
EKRXFC+CMSS10                    Type 1C   yes yes no      84  0
Courier                          Type 1    no  no  no      91  0
Helvetica                        Type 1    no  no  no      97  0
UELPFP+CMMI10                    Type 1C   yes yes no     135  0
VZIXBZ+CMR10                     Type 1C   yes yes no     133  0

Now if a PDF reader encounters a font that is not embedded it uses a strategy similar to the following. It...

  • (1) ....searches the local OS and tries to find a font with a matching type and name in order to use that for rendering the text; if that doesn't succeed, it....
  • (2) ....searches the local OS to find a font with a matching name (maybe other font type); if not successfull, it....
  • (3) ....searches for an appropriate substitute font (which has font metrics that are close to the metrics of the original font -- original font's metric info should be embedded in the PDF, even if the font file itself isn't); if not successfull, then...
  • (4) ....use Courier to render the text.

My hypothesis for the root cause of your problem is:

The glyph for the ∑ character is missing from your ttf-symbol-replacement font, or this glyph is found at a different spot in that replacement font's glyph table.

Hence it is not Evince's fault for not being able to render that file correctly.

On the other hand, Acrobat Reader does ship with application-embedded instances of Times, Courier, Helvetica and Symbol, so that it can render such PDFs. So AcroRead does not have a problem with this file. (And Evince cannot use such tricks due to the licenses of these fonts...)

Mark my words:
If you want foolproof PDF files that can be rendered (and printed) correctly by each and every PDF reader on all types of OS systems, make sure your PDF has embedded all fonts it uses!

Repairing your zhang11a.pdf

However, it is possible to repair your problematic PDF with the help of Ghostscript. I used this command on a Ubuntu Oneiric system (which uses Ghostscript v9.02) to do this:

/usr/bin/gs \
  -o gs-repaired---zhang11a.pdf \
  -dPDFSETTINGS=/prepress \
  -sDEVICE=pdfwrite \
   zhang11a.pdf 

The -dPDFSETTINGS=/prepress part of the CLI parameters tell Ghostscript to embed all non-embedded fonts.

This is how the embedded-ness' property of the repaired PDF now looks like:

kupfe@abc:~$ pdffonts gs-repaired---zhang11a.pdf
name                             type      emb sub uni object ID
-------------------------------- --------- --- --- --- ---------
AFNVKD+Times-Italic              Type 1C   yes yes no      12  0   
PEQXED+CMSY10                    Type 1C   yes yes yes     14  0   
FYXQNZ+Times-Roman               Type 1C   yes yes no       8  0    
XILTND+Times-Bold                Type 1C   yes yes no      10  0   
HZJMVE+Symbol                    Type 1C   yes yes no      36  0   
EGYAWT+CMR10                     Type 1C   yes yes no      26  0   
AQGZYJ+CMMI10                    Type 1C   yes yes yes     28  0   
YJATHO+ZapfChancery-MediumItalic Type 1C   yes yes no      30  0   
CZXDRN+MSBM10                    Type 1C   yes yes no      32  0   
KTZJPT+CMEX10                    Type 1C   yes yes no      34  0   
NYTDMD+CMMI10                    Type 1C   yes yes no      58  0   
DFQTPB+CMMI7                     Type 1C   yes yes no      60  0   
GXJYGS+CMBX10                    Type 1C   yes yes no      62  0   
QAMUEV+CMMI5                     Type 1C   yes yes no      64  0   
QEWIFQ+CMEX9                     Type 1C   yes yes no      76  0   
KNOSJH+CMSS10                    Type 1C   yes yes no      88  0   
UCHHLK+Courier                   Type 1C   yes yes no      95  0   
TWNVND+Helvetica                 Type 1C   yes yes no     102  0  
ZDIWNO+CMR10                     Type 1C   yes yes no     139  0  
IGJFUT+CMMI10                    Type 1C   yes yes no     141  0  

I checked how Evince renders the repaired PDF: it is OK now.


Update:

Martin Schröder is right in stating that -- according to the ISO PDF standard -- none of the 'Base 14' PDF fonts (which are the 4 'standard', 'italic', 'bold', and 'bold-italic' variations for Helvetica, Times and Courier plus Symbol and Dingbats) needs to be embedded and that all PDF viewers should provide their own means of rendering all glyphs in these fonts even in the case of them not being embedded in the file.

In reality, following this recommendation did lead to many problems in real life (such as one case is on display in this very question): because not all viewers, renderers and automatic PDF processors do reliably render the glyphs for un-embedded fonts. And that's the reason why all current ISO standards for PDF/A (archiving) and PDF/X (blind eXchange) require to embed all fonts (even the 'Base 14' ones) in PDF files. Otherwise that file is not deemed compliant with the respective standard.

And as my Ghostscript command's result shows: embedding the Symbol font does reliably avoid the ∑ glyph rendering problem for Evince. Even if you consider it an Evince bug (which you rightly can) that it doesn't correctly render the original PDF...

like image 117
Kurt Pfeifle Avatar answered Sep 22 '22 15:09

Kurt Pfeifle