Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Computationally heavy R vignettes

I'm currently converting an JSS article, which uses knitr, to an R package vignette. However, I'm in doubt on the vignette placement, structure, and how I should handle the very long computation times it requires which is ~2 days on a ordinary laptop.

The official documentation offer little to no information regarding this. A short note in an answer in the mailing list is the only information I find when searching. Brian Ripley writes here:

In particular, CRAN does accept packages with Sweave vignettes that take too long to check -- one takes ca 8 hours [...]. We just ask that we are told so on submission.

Hadley Wickham's description of vignettes says to set eval = FALSE as the chunk option. However, this is not a viable approach in my case as the generated data from the computations are needed.

This presentation suggests that /inst/doc are to be used for pre-compiled and heavy vignettes. However, that do not agree very well with the new guidelines on using /vignettes for package vignettes (or what?).

Currently, I've placed my source files in /vignettes and I create an .RData file which contains the most computationally expensive objects (and which is also quite large). The scripts then check if the objects are available through that .RData file, if not, the objects are created. So to compile and run completely from scratch, the .RData file can simply be deleted.

Do anyone have some experience or pointers regarding this problem? Should the vignette be in /vignettes or /inst/doc? If the former is preferred, where do I place the needed files such as .bib, .RData, etc.? I must admit I find the /vignettes vs /inst/doc somewhat confusing.

like image 883
Anders Ellern Bilgrau Avatar asked Mar 10 '15 10:03

Anders Ellern Bilgrau


1 Answers

I present the following solution for knitr-based vignettes. I will assume you are using devtools for package maintenance. To prevent R from running vignette code during package checks (ie. R CMD check) which will let you include computationally heavy vignettes, the vignette must:

  1. Employ a vignette engine that does not tangle the R code. In other words, the engine must not produce .R files in inst/doc when you execute devtools::build_vignettes(). The knitr package provides engines that do not tangle R code, including knitr::rmarkdown_notangle which can be used as a drop in replacement for knitr::rmarkdown.
  2. Include code that disables chunk evaluation dynamically when it detects that it is being executed within a call to R CMD check. This can be achieved by placing code at the top of the vignette that checks the various settings and setting chunk settings using knitr::opts_chunk$set(eval = ...) when appropriate. The code shown below was borrowed from the knitr package, so many thanks to Yihui Xie for working out how to do this.

Below is an example of a rmarkdown vignette file that uses these two strategies so that it can be built using devtools::build_vignettes() and will not have its code executed during R CMD check. Note that the code is still executed whilst building the package (eg. which is done during devtools::build() and devtools::check()).

---
title: "example vignette"
output:
  rmarkdown::html_document:
    self_contained: yes
fontsize: 11pt
documentclass: article
vignette: >
  %\VignetteIndexEntry{example vignette}
  %\VignetteEngine{knitr::rmarkdown_notangle}
---

```{r, include = FALSE}
is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_",
             "_R_CHECK_LICENSE_") %in% names(Sys.getenv()))
knitr::opts_chunk$set(eval = !is_check)
```

```{r}
Sys.sleep(100)
```

For examples of this approach in the wild, see this vignette for a developmental package on GitHub.

like image 136
paleo13 Avatar answered Sep 27 '22 16:09

paleo13