Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

knitr called from RStudio does not preserve the order in which packages are loaded

When I render Rmd markdown files that used cached chunks in RStudio using the Knit HTML button, I find that the order in which packages are loaded is not remembered from chunk to chunk. This causes problems when I need to load packages in a specific order to avoid namespace clashes.

For a reproducible example (which requires the plyr, dplyr, and pryr packages to be installed, see below), I start by creating a knitr Rmd document that loads plyr and then dplyr (which both export a summarise function), then uses pryr to determine which summarise function is found. I knit this using RStudio's "Knit HTML" button:

```{r}
library(knitr)
opts_chunk$set(cache = TRUE, message = FALSE)
```

```{r test1}
library(plyr)
library(dplyr)
```

```{r test2, dependson = "test1"}
attr(pryr::where("summarise"), "name")
```

As recommended here, I load plyr before dplyr so that dplyr's functions should come first in the search path. As expected, the output md file shows that the summarise function comes from dplyr:

attr(pryr::where("summarise"), "name")
## [1] "package:dplyr"

However, if I make some small change in the test2 chunk:

```{r test2, dependson = "test1"}
attr(pryr::where("summarise"), "name")  # this is a change
```

that causes it to be recompiled, it now loads the packages in the wrong order, and summarise is found in plyr:

attr(pryr::where("summarise"), "name")  # this is a change
## [1] "package:plyr"

Note that this problem does not occur if one is running knit from the R command line, but that is only because it keeps the plyr and dplyr packages loaded in the environment (if I restart R the same problem occurs).

I am aware that I can refer to functions as dplyr::summarise to avoid redundancy, but this is rather cumbersome. Not loading plyr at all is not an option since several packages inadvertently add it to the namespace. How can I ensure the packages load in the desired order?

I am using the latest version of RStudio (0.98.1079), and my sessionInfo is below:

## R version 3.1.1 (2014-07-10)
## Platform: x86_64-apple-darwin13.1.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] plyr_1.8.1    dplyr_0.3.0.2 knitr_1.7    
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   codetools_0.2-8  DBI_0.3.0        digest_0.6.4    
##  [5] evaluate_0.5.5   formatR_1.0      htmltools_0.2.4  magrittr_1.0.0  
##  [9] parallel_3.1.1   pryr_0.1.0.9000  Rcpp_0.11.2      rmarkdown_0.3.10
## [13] rstudioapi_0.1   stringr_0.6.2    tools_3.1.1

Note that if necessary, you can set up the necessary packages for this reproducible example with:

```{r}
install.packages(c("devtools", "plyr", "dplyr"))
devtools::install_github("hadley/pryr")
```
like image 307
David Robinson Avatar asked Oct 21 '14 17:10

David Robinson


2 Answers

My pull request to knitr addresses this issue by preserving the order of the search path in the __packages file. The relevant code is:

x = rev(.packages())
if (file.exists(path)) 
    x = setdiff(c(readLines(path), x), .base.pkgs)
writeLines(x, path)

@Yihui merged the request as of this commit, and it will likely be available in knitr v1.8 in CRAN (or immediately from GitHub or RForge).

There still may be issues when packages are loaded in different chunks that do not depend on each other, but this does fix the example in the question above and in other applications I've tried.

like image 107
David Robinson Avatar answered Nov 08 '22 09:11

David Robinson


Posting this as an answer since it seems more substantial than a comment.

tl;dr: try removing cache/__packages manually between runs (and adding cache=FALSE to your package-loading chunk/doing without package caching) and see if that solves the problem ... or even add

if (file.exists("cache/__packages")) unlink("cache/__packages")

(I haven't actually tested this on your example.)

I've had a lot of trouble with package caching, especially in working directories where I'm running lots of examples with not-necessarily-compatible package sets. I often just remove cache/__packages by hand. I imagine the design could be improved (but I haven't bothered to construct examples/think about how the design would be improved).

like image 3
Ben Bolker Avatar answered Nov 08 '22 09:11

Ben Bolker