Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using Unicode 'dingbat-like' glyphs in R graphics, across devices & platforms, especially PDF

Some of you may have seen my blog post on this topic, where I wrote the following code after wanting to help a friend produce half-filled circles as points on a graph:

TestUnicode <- function(start="25a0", end="25ff", ...)
  {
    nstart <- as.hexmode(start)
    nend <- as.hexmode(end)
    r <- nstart:nend
    s <- ceiling(sqrt(length(r)))
    par(pty="s")
    plot(c(-1,(s)), c(-1,(s)), type="n", xlab="", ylab="",
         xaxs="i", yaxs="i")
    grid(s+1, s+1, lty=1)
    for(i in seq(r)) {
      try(points(i%%s, i%/%s, pch=-1*r[i],...))
    }
  }

TestUnicode(9500,9900) 

This works (i.e. produces a nearly-full grid of cool dingbatty symbols):

  • on Ubuntu 10.04, in an X11 or PNG device
  • on Mandriva Linux distribution, same devices, with locally built R, once pango-devel was installed

It fails to varying degrees (i.e. produces a grid partly or entirely filled with dots or empty rectangles), either silently or with warnings:

  • on the same Ubuntu 10.04 machine in PDF or PostScript (tried setting font="NimbusSan" to use URW fonts, doesn't help)
  • on MacOS X.6 (quartz, X11, Cairo, PDF)

For example, trying all the available PDF font families:

flist <- c("AvantGarde", "Bookman","Courier", "Helvetica", "Helvetica-Narrow",
        "NewCenturySchoolbook", "Palatino", "Times","URWGothic",
        "URWBookman", "NimbusMon", "NimbusSan", "NimbusSanCond",
        "CenturySch", "URWPalladio","NimbusRom")

for (f in flist) {
  fn <- paste("utest_",f,".pdf",sep="")
  pdf(fn,family=f)
  TestUnicode()
  title(main=f)
  dev.off()
  embedFonts(fn)
}

on Ubuntu, none of these files contains the symbols.

It would be nice to get it to work on as many combinations as possible, but especially in some vector format and double-especially in PDF.

Any suggestions about font/graphics device configurations that would make this work would be welcomed.

like image 684
Ben Bolker Avatar asked May 04 '11 15:05

Ben Bolker


2 Answers

I think you are out of luck Ben, as, according to some notes by Paul Murrell, pdf() can only handle single-byte encodings. Multi-byte encodings need to be converted to a the single-byte equivalent, and therein lies the rub; by definition, single-byte encodings cannot contain all the glyphs that can be represented in a multi-byte encoding like UTF-8, say.

Paul's notes can be found here wherein he suggests a couple of solutions using Cairo-based PDF devices, using cairo_pdf() on suitably-endowed Linux and Mac OS systems, or via the Cairo package under MS Windows.

like image 63
Gavin Simpson Avatar answered Oct 09 '22 16:10

Gavin Simpson


I have found the cairo_pdf device to be completely insufficient: the output is markedly different from both pdf and on-screen rendering, and its plotmath support is sketchy.

However, there’s a rather simple workaround on OS X: Use the “normal” quartz device and set its type to pdf:

quartz(type = 'pdf', file = 'output.pdf')

Unfortunately, on my computer this ignores the font family and always uses Helvetica (although the documentation claims that the default is Arial).

There are at least two other gotchas:

  • pdf converts hyphens to minuses. This may not even always be what you want but it’s quite useful to properly typeset negative numbers. The linked thread describes workarounds for this.
  • It’s of course platform specific and only works on OS X.

(I realise that OP briefly mentions the Quartz device but this thread is frequently viewed and I think this solution needs more prominence.)

like image 8
Konrad Rudolph Avatar answered Oct 09 '22 15:10

Konrad Rudolph