Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problems with ggplot and pgfSweave

I started using Sweave some time ago. However, like most people I encountered pretty soon a major problem: Speed. Sweaving a large document takes ages to run, which makes efficient working quite challenging. Data processing can be accelerated very much with cacheSweave. However, plots - especially ggplot ;) - still take too long to render. That’s way I want to use pgfSweave.

After many, many hours, I finally succeeded in setting up a working system with Eclipse/StatET/Texlipse. I then wanted to convert an existing report to use with pgfSweave and had a bad surprise: most of my ggplots doesn’t seem to work anymore. The following plot for example works perfectly in the console and Sweave:

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

Running it with pgfSweave, however, I get this error:

Error in if (width > 0) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In if (width > 0) { :
  the condition has length > 1 and only the first element will be used
Error in driver$runcode(drobj, chunk, chunkopts) : 
  Error in if (width > 0) { : missing value where TRUE/FALSE needed

When I remove aes(...) from geom_point, the plot works perfectly with pgfSweave.

pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)

Edit: I investigated more into the problem and could reduce the problem to the tikz-device.

This works just fine:

quartz()
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)

This gives the above error:

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point(aes(colour=que_id))
print(pl)
dev.off()

This works just fine as well:

tikz( 'myPlot.tex',standAlone = T )
pl <- ggplot(plot_info,aes(elevation,area))
pl <- pl + geom_point()
print(pl)
dev.off()

I could repeat this with 5 different ggplots. When not using colour (or size, alpha,...) in the mapping, it works with tikz.

Q1: Does anybody has any explanations for this behavior?

Additionally, caching of non-plot code chunks doesn’t work very well. The following code chunk takes no time at all with Sweave. With pgfSweave, it takes approximately 10 sec.

<<plot.opts,echo=FALSE,results=hide,cache=TRUE>>=
#colour and plot options are globally set
pal1 <- brewer.pal(8,"Set1")
pal_seq <- brewer.pal(8,"YlOrRd")
pal_seq <- c("steelblue1","tomato2")
opt1 <- opts(panel.grid.major = theme_line(colour = "white"),panel.grid.minor = theme_line(colour = "white"))
sca_fill_cont_opt <- scale_fill_continuous(low="steelblue1", high="tomato2")
ory <- geom_hline(yintercept=0,alpha=0.4,linetype=2) 
orx <- geom_vline(xintercept=0,alpha=0.4,linetype=2)
ts1 <- 2.3
ts2 <- 2.5
ts3 <- 2.8
ps1 <- 6
offset_x <- function(x,y) 0.15*x/pmax(abs(x),abs(y))
offset_y <- function(x,y) 0.05*y/pmax(abs(x),abs(y))
plot_size <- 50*50

This seems a pretty strange behavior as well, as only some variables are set for later use.

Q2: Anybody got any explanations for that?

Q3: More generally, I would like to ask if anybody at all is using pgfSweave successfully? With successfully I mean that all things that work in Sweave also work in pgfSweave, with the additional benefit of nice fonts and improved speed. ;)

Thanks very much for responses!

like image 305
donodarazao Avatar asked Nov 17 '10 22:11

donodarazao


3 Answers

Q1: Does anybody have any explanations for this behavior?

These are three reasons behind why tikzDevice gives an error when trying to construct your plot:

  • When you add an aesthetic mapping that creates a legend, such as aes(colour=que_id), ggplot2 will use the variable name as the title of the legend---in this case, que_id.

  • The tikzDevice passes all strings, such as legend titles, to LaTeX for typesetting.

  • In LaTeX the underscore character, _, is used to denote a subscript. If an underscore is used outside of math mode, it causes an error.

When the tikzDevice tries to calculate the height and width of the legend title, "que_id", it passes the string to LaTeX for typesetting and expects LaTeX to return the width and height of the string. LaTeX suffers an error because there is an unescaped underscore used in the string outside of mathmode. The tikzDevice receives a NULL for the string width instead of a number which causes an if (width > 0) check to fail.

Ways to avoid the problem

  1. Specify a legend title to use by adding a color scale:

    p1 <- ggplot(plot_info, aes(elevation, area))
    p1 <- p1 + geom_point(aes(colour=que_id))
    
    
    # Add a name that is easier for humans to read than the variable name
    p1 <- p1 + scale_colour_brewer(name="Que ID")
    
    
    # Or, replace the underscore with the appropriate LaTeX escape sequence
    p1 <- p1 + scale_colour_brewer(name="que\\textunderscore id")
    
  2. Use the string sanitization feature introduced in tikzDevice 0.5.0 (but was broken until 0.5.2). Currently, string sanitization will only escape the following characters: %, $, {, }, and ^ by default. However, you can specify additional substitution pairs via the tikzSanitizeCharacters and tikzReplacementCharacters options:

    # Add underscores to the sanitization list
    options(tikzSanitizeCharacters = c('%','$','}','{','^', '_'))
    options(tikzReplacementCharacters = c('\\%','\\$','\\}','\\{',
      '\\^{}', '\\textunderscore'))
    
    
    # Turn on string sanitization when starting the plotting device
    tikz('myPlot.tex', standAlone = TRUE, sanitize = TRUE)
    print(p1)
    dev.off()
    

We will be publishing version 0.5.3 of the tikzDevice in the next couple of weeks in order to address some annoying warning messages that now show up due to changes in the way R handles system(). I will add the following changes to this next version:

  • Better warning message when width is NULL indicating that there is probably something wrong with plot text.

  • Add underscores and a few other characters to the default set of characters that the string sanitizer looks for.

Hope this helps!

like image 194
Sharpie Avatar answered Nov 10 '22 22:11

Sharpie


Q2: I am the maintainer of pgfsweave.

Here are the results of a test I ran:

time R CMD Sweave time-test.Rnw 

real    0m1.133s
user    0m1.068s
sys     0m0.054s

time R CMD pgfsweave time-test.Rnw 

real    0m2.941s
user    0m2.413s
sys     0m0.364s

time R CMD pgfsweave time-test.Rnw 

real    0m2.457s
user    0m2.112s
sys     0m0.283s

I believe the there are 2 reasons for the time difference but it would take more work to verify them exactly:

  • pgfSweave does a ton of checking and double checking to make sure that it is not redoing expensive computations. The goal is to make it feasible to do more expensive calculations and plotting within a document. The scale of "expensive" in this case is much more than the additional second or two to do checks.

As an example of the caching consider the following test file to see the real benefits of caching:

\documentclass{article}

\begin{document}

<<plot.opts,cache=TRUE>>=
x <- Sys.sleep(10)
@

\end{document}

And the results:

time R CMD Sweave time-test2.Rnw 

real    0m10.334s
user    0m0.283s
sys     0m0.047s

time R CMD pgfsweave time-test2.Rnw 

real    0m12.032s
user    0m1.356s
sys     0m0.349s

time R CMD pgfsweave time-test2.Rnw 

real    0m1.423s
user    0m1.121s
sys     0m0.266s
  • Sweave has undergone some changes in R 2.12. The changes may have sped up the process of code chunk evaluation and left pgfSweave behind for these smaller calculations. Worth looking into

Q3: I use pgfSweave myself all the time for my own work. There have been some changes in Sweave in R 2.12 that have been causing some minor problems with pgfSweave but a new version is forthcoming that fixes everything. The development version on github ( https://github.com/cameronbracken/pgfSweave) already has the changes. If you are having additional problems I would be happy to help.

like image 27
cameron.bracken Avatar answered Nov 11 '22 00:11

cameron.bracken


Q2: Do you use \pgfrealjobname{<DOCUMENTNAME>} in the header and option external=TRUE for the graphics chunks? I've found that that increases the speed a lot (not for the first compilation, but for subsequent ones if the graphics are unchanged). You'll find more background in the pgfSweave vignette.

Q3: Everything works fine for me, I use Windows + Eclipse/StatEt/Texlipse like you.

like image 26
fabians Avatar answered Nov 11 '22 00:11

fabians