Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can knitr's cached results be used to reproduce the environment in a given chunk?

Tags:

tl;dr

My question: Within an R session, is there some way to use knitr's cached results to 'fast-forward' to the environment (i.e. the set of objects) available in a given code block, in the same sense that knit() itself does?


Setup:

knitr's built-in cacheing of code chunks is one of its killer features.

It's especially helpful when some chunks contain time-consuming computations. Unless they (or a chunk they depend on) is altered, the computations only need be carried out the first time the document is knited: upon all subsequent calls to knit, the objects created by the chunk will just be loaded from the cache.

Here's a minimal-ish example, a file called "lotsOfComps.Rnw":

\documentclass{article} \begin{document}  The calculations in this chunk take a looooong time.  <<slowChunk, cache=TRUE>>= Sys.sleep(30)  ## Stands in for some time-consuming computation x <- sample(1:10, size=2) @  I wish I could `fast-forward' to this chunk, to view the cached value of  \texttt{x}  <<interestingChunk>>= y <- prod(x)^2 y @  \end{document} 

Times needed to knit and TeXify "lotsOfComps.Rnw":

## First time system.time(knit2pdf("lotsOfComps.Rnw")) ##   user  system elapsed ##   0.07    0.02   31.81  ## Second (and subsequent) runs system.time(knit2pdf("lotsOfComps.Rnw")) ##   user  system elapsed ##   0.03    0.02    1.28 

My question:

Within an R session, is there some way to use knitr's cached results to 'fast-forward' to the environment (i.e. the set of objects) available in a given code block, in the same sense that knit() itself does?


Doing purl("lotsOfComps.Rnw") and then running the code in "lotsOfComps.R" doesn't work, because all of the objects along the way must be recomputed.

Ideally, it would be possible to do something like this to end up in the environment that exists at the beginning of <<interestingChunk>>=:

spin("lotsOfComps.Rnw", chunk="interestingChunk") ls() # [1] "x" x # [1] 3 8 

Since spin() is not (yet?) available, what's the best way to get the equivalent result?

like image 607
Josh O'Brien Avatar asked Mar 29 '13 17:03

Josh O'Brien


People also ask

What does cache do in r markdown?

The most appropriate use case of caching is to save and reload R objects that take too long to compute in a code chunk, and the code does not have any side effects, such as changing global R options via options() (such changes will not be cached). If a code chunk has side effects, we recommend that you do not cache it.

How does adding code chunks improve the usability of your R markdown file?

You can add options to each code chunk. These options allow you to customize how or if you want code to be processed or appear on the rendered output (pdf document, html document, etc). Code chunk options are added on the first line of a code chunk after the name, within the curly brackets.

How do I clear my Knitr cache?

If you run into problems with cached output you can always clear the knitr cache by removing the folder named with a _cache suffix within your document's directory.


1 Answers

Here is one solution, which is still a little bit awkward but it works. The idea is to add a chunk option named mute which takes NULL by default, but it can also take an R expression, e.g. mute_later() below. When knitr evaluates the chunk options, mute_later() can be evaluated and NULL is returned; at the same time, there are side effects in opts_chunk (setting the global chunk options like eval = FALSE).

Now what you need to do is to put mute=mute_later() in the chunk after which you want to skip the rest of the chunks, e.g. you can move this option from example-a to example-b. Because mute_later() returns NULL which happens to be the default value of the mute options, the cache will not be broken even you move this option around.

\documentclass{article} \begin{document}  <<setup, include=FALSE, cache=FALSE>>= rm(list = ls(all.names = TRUE), envir = globalenv()) opts_chunk$set(cache = TRUE) # enable cache to make it faster opts_chunk$set(eval = TRUE, echo = TRUE, include = TRUE)  # set global options to mute later chunks mute_later = function() {   opts_chunk$set(cache = FALSE, eval = FALSE, echo = FALSE, include = FALSE)   NULL } # a global option mute=NULL so that using mute_later() will not break cache opts_chunk$set(mute = NULL) @  <<example-a, mute=mute_later()>>= x = rnorm(4) Sys.sleep(5) @  <<example-b>>= y = rpois(10,5) Sys.sleep(5) @  <<example-c>>= z = 1:10 Sys.sleep(3) @  \end{document} 

It is awkward in the sense that you have to cut-and-paste , mute=mute_later() around. Ideally you should just set the chunk label like the gist I wrote for Barry.

The reason that my original gist did not work is because chunk hooks are ignored when a chunk is cached. The second time you knit() the file, the chunk hook checkpoint for example-a was skipped, therefore eval=TRUE for the rest of chunks, and you saw all chunks were evaluated. By comparison, chunk options are always dynamically evaluated.

like image 112
Yihui Xie Avatar answered Sep 29 '22 06:09

Yihui Xie