Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make code chunks depend on all previous chunks in knitr/rmarkdown?

Tags:

r

knitr

Goal

I want to create my data analysis reproducible by making chunks depend on all previous chunks. So, if there are 3 chunks and I change something in the 1st chunk the subsequent 2 chunks should re-run so that they reflect the change made in the outputs. I want to add this condition in the global chunk options at the top of the document so that I don't have to use dependson multiple times.

Problems

The outputs of a chunk don't change if it is not modified and cache=TRUE. For the chunks containing the code, I can make them dependable on all previous ones using following at the top of the document:

```{r setup, echo=FALSE}
# set global chunk options: 
library(knitr)
opts_chunk$set(cache=TRUE, autodep = TRUE)
dep_auto()
```

If any of the above chunks is changed, all subsequent chunks are re-run. But this does not work if I use source() in chunks to read R scripts. Following is an example document:

---
title: "Untitled"
output: html_document
---
```{r setup, echo=FALSE}
# set global chunk options: 
library(knitr)
opts_chunk$set(cache=TRUE, autodep = TRUE)
dep_auto()
```


# Create Data
```{r}
#source("data1.R")
x <- data.frame(col1 = 4:10, col2 = 6:12)
x
```

# Summaries
```{r}
#source("data2.R")

median1.of.x <- sapply(x, function(x) median(x)-1)

sd.of.x <- sapply(x, sd)

plus.of.x <- sapply(x, function(x) mean(x)+1)

jj <- rbind(plus.of.x, sd.of.x, median1.of.x)

```

```{r}
jj
```

Now, if I change any of the 1st 2 chunks the third chunk gives correct output after knitting. But if instead I put the first chunk's code in a source file data1.R and second chunk's in file data2.R, keeping the global chunk options same as before, if I make any changes in source files they are not reflected in the output of third chunk correctly. For example, changing x to x <- data.frame(col1 = 5:11, col2 = 6:12) should yield:

 > jj
                 col1      col2
plus.of.x    9.000000 10.000000
sd.of.x      2.160247  2.160247
median1.of.x 8.000000  9.000000 

But with use of source() as discussed above, the knitr document reports:

 jj
##                col1      col2
## mean.of.x  5.000000  9.000000
## sd.of.x    2.160247  2.160247
## minus.of.x 6.000000 10.000000 

What settings do I need to change to use source in knitr docs correctly?

like image 252
umair durrani Avatar asked Oct 21 '15 21:10

umair durrani


1 Answers

When you use source(), knitr is unable to analyze the possible objects to be created from it; knitr must be able to see the full source code to analyze the dependencies among code chunks. There are two approaches to solve your problem:

  1. Tell the second chunk that it depends on the value of x by adding an arbitrary chunk option that uses the value of x, e.g. ```{r cache.extra = x}; then whenever x changes, the cache of this code chunk will be automatically invalidated (more info);
  2. Let knitr see the full source code; you can pass the source code to a code chunk via the chunk option code, e.g. ```{r code = readLines('data1.R')} (same for data2.R); then dep_auto() should be able to figure out x was created from the first chunk, and used in the second chunk, so the second chunk must depend on the first chunk.
like image 161
Yihui Xie Avatar answered Oct 11 '22 20:10

Yihui Xie