Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference: "Compile PDF" button in RStudio vs. knit() and knit2pdf()

Tags:

r

rstudio

knitr

TL;DR

What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF"1 button in RStudio?

Motivation

Most users of knitr seem to write their documents in RStudio and compile the documents using the "Compile PDF" / "Knit HTML" button. This works smoothly most of the time, but every once a while there are special requirements that cannot be achieved using the compile button. In these cases, the solution is usually to call knit()/knit2pdf()/rmarkdown::render() (or similar functions) directly.

Some examples:

  • How to knit/Sweave to a different file name?
  • Is there a way to knitr markdown straight out of your workspace using RStudio?
  • Insert date in filename while knitting document using RStudio Knit button

Using knit2pdf() instead of the "Compile PDF" button usually offers a simple solution to such questions. However, this comes at a price: There is the fundamental difference that "Compile PDF" processes the document in a separate process and environment whereas knit2pdf() and friends don't.

This has implications and the problem is that not all of these implications are obvious. Take the fact that knit() uses objects from the global environment (whereas "Compile PDF" does not) as an example. This might be obvious and the desired behavior in cases like the second example above, but it is an unexpected consequence when knit() is used to overcome problems like in example 1 and 3.

Moreover, there are more subtle differences:

  • The working directory might not be set as expected.
  • Packages need to be loaded.
  • Some options that are usually set by RStudio may have unexpected values.

The Question and it's goal

Whenever I read/write the advice to use knit2pdf() instead of "Compile PDF", I think "correct, but the user should understand the consequences …".

Therefore, the question here is:

What are the (possibly unwanted) side-effects of using knit()/knit2pdf() instead of the "Compile PDF" button in RStudio?

If there was a comprehensive (community wiki?) answer to this question, it could be linked in future answers that suggest using knit2pdf().

Related Questions

There are dozens of related questions to this one. However, they either propose only code to (more or less) reproduce the behavior of the RStudio button or they explain what "basically" happens without mentioning the possible pitfalls. Others look like being very similar questions but turn out to be a (very) special case of it. Some examples:

  • Knit2html not replicating functionality of Knit HTML button in R Studio: Caching issue.
  • HTML outputs are different between using knitr in Rstudio & knit2html in command line: Markdown versions.
  • How to convert R Markdown to HTML? I.e., What does “Knit HTML” do in Rstudio 0.96?: Rather superficial answer by Yihui (explains what "basically" happens) and some options how to reproduce the behavior of the RStudio button. Neither the suggested Sys.sleep(30) nor the "Compile PDF" log are insightful (both hints point to the same thing).
  • What does “Knit HTML” do in Rstudio 0.98?: Reproduce behavior of button.

About the answer

I think this question raised many of the issues that should be part of an answer. However, there might be many more aspects I don't know about which is the reason why I am reluctant to self-answer this question (though I might try if nobody answers).

Probably, an answer should cover three main points:

  • The new session vs. current session issue (global options, working directory, loaded packages, …).
  • A consequence of the first point: The fact that knit() uses objects from the calling environment (default: envir = parent.frame()) and implications for reproducibility. I tried to tackle the issue of preventing knit() from using objects from outside the document in this answer (second bullet point).
  • Things RStudio secretly does …
    • … when starting an interactive session (example) --> Not available when hitting "Compile PDF"
    • … when hitting "Compile PDF" (anything special besides the new session with the working directory set to the file processed?)

I am not sure about the right perspective on the issue. I think both, "What happens when I hit 'Compile PDF' + implications" as well as "What happens when I use knit() + implications" is a good approach to tackle the question.


1 The same applies to the "Knit HTML" button when writing RMD documents.

like image 465
CL. Avatar asked Jan 04 '16 12:01

CL.


1 Answers

First of all, I think this question is easier to answer if you limit the scope to the "Compile PDF" button, because the "Knit HTML" button is a different story. "Compile PDF" is only for Rnw documents (R + LaTeX, or think Sweave).

I'll answer your question following the three points you suggested:

  1. Currently RStudio always launch a new R session to compile Rnw documents, and first changes the working directory to the directory of the Rnw file. You can imagine the process as a shell script like this:

    cd path/to/your-Rnw-directory
    Rscript -e "library(knitr); knit('your.Rnw')"
    pdflatex your.tex
    

    Note that the knitr package is always attached, and pdflatex might be other LaTeX engines (depending on your RStudio configurations for Sweave documents, e.g., xelatex). If you want to replicate it in your current R session, you may rewrite the script in R:

    owd = setwd("path/to/your-Rnw-directory")
    system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')"))
    system2("pdflatex", "your.tex")
    setwd(owd)
    

    which is not as simple as knitr::knit('path/to/your.Rnw'), in which case the working directory is not automatically changed, and everything is executed in the current R session (in the globalenv() by default).

  2. Because the Rnw document is always compiled in a new R session, it won't use any objects in your current R session. This is hard to replicate only through the envir argument of knitr::knit() in the current R session. In particular, you cannot use knitr::knit(envir = new.env()) because although new.env() is a new environment, it has a default parent environment parent.frame(), which is typically the globalenv(); you cannot use knitr::knit(envir = emptyenv()), either, because it is "too clean", and you will have trouble with objects even in the R base package. The only reliable way to replicate what the "Compile PDF" button does is what I said in 1: system2("Rscript", c("-e", shQuote("library(knitr); knit('your.Rnw')")), in which case knit() uses the globalenv() of a new R session.

  3. I'm not entirely sure about what RStudio does for the repos option. It probably automatically sets this option behind the scenes if it is not set. I think this is a relatively minor issue. You can set it in your .Rprofile, and I think RStudio should respect your CRAN mirror setting.

Users have always been asking why the Rnw document (or R Markdown documents) are not compiled in the current R session. To us, it basically boils down to which of the following consequences is more surprising or undesired:

  1. If we knit a document in the current R session, there is no guarantee that your results can be reproduced in another R session (e.g., the next time you open RStudio, or your collaborators open RStudio on their computers).
  2. If we knit a document in a new R session, users can be surprised that objects are not found (and when they type the object names in the R console, they can see them). This can be surprising, but it is also a good and early reminder that your document probably won't work the next time.

To sum it up, I think:

  • Knitting in a new R session is better for reproducibilty;

  • Knitting in the current R session is sometimes more convenient (e.g., you try to knit with different temporary R objects in the current session). Sometimes you also have to knit in the current R session, especially when you are generating PDF reports programmatically, e.g., you use a (for) loop to generate a series of reports. There is no way that you can achieve this only through the "Compile PDF" button (the button is mostly only for a single Rnw document).

BTW, I think what I said above can also apply to the Knit or Knit HTML buttons, but the underlying function is rmarkdown::render() instead of knitr::knit().

like image 126
Yihui Xie Avatar answered Nov 18 '22 02:11

Yihui Xie