Avoid loading data every time in knitr

Q: How do you not run a chunk of code in R markdown?

If you don't want any code chunks to run you can add eval = FALSE in your setup chunk with knitr::opts_chunk$set() . If you want only some chunks to run you can add eval = FALSE to only the chunk headers of those you don't want to run.

Q: How do I turn off library messages in RMarkdown?

For example, when you library(tidyverse) or library(ggplot2) , you may see some loading messages. Such messages can also be suppressed by the chunk option message = FALSE .

Q: What is the purpose of knitr?

knitr is an engine for dynamic report generation with R. It is a package in the programming language R that enables integration of R code into LaTeX, LyX, HTML, Markdown, AsciiDoc, and reStructuredText documents. The purpose of knitr is to allow reproducible research in R through the means of literate programming.

Tags:

r

knitr

r-markdown

I am creating a document using knitr and I am finding it tedious to reload the data from disk every time I parse the document while I'm in development. I've subsetted that datafile for development to shorten the load time. I also have knitr cache set to on.

I tried assigning the data to the global environment using <<-, and using exists with where=globalenv(), but that did not work.

Anyone know how to use preloaded data from the environment in knitr or have other ideas to speed up development?

580

asked Sep 21 '14 19:09

Daniel

1 Answers

When a document is knitted, a new environment is created within R, and therefore any settings in the global environment will not be passed to the document. However, this is done intentionally, as accidentally referencing an object in the global environment is an easy thing to break a reproducible analysis, and therefore making a clean session each time means the RMarkdown file runs on its own, regardless of the global environment settings.

If you do have a use case which justifies preloading the data, there are a few things you can do.

Example Data

Firstly I have created a minimal Rmd file as below called "RenderTest.Rmd":

title: "Render"
author: "Michael Harper"
date: "7 November 2017"
output: pdf_document
---

```{r cars}
summary(cars2)
```

In this example, cars2 is a set of data I am referencing to from my global session. Run on its using the "Knit" command in RStudio, this will return the following error:

Error in summary(cars): object 'cars2' not found: ... withCallignHandlers -> withVisible -> eval -> eval -> summary Execution halted

Option 1: Manually Call the render function

The render function from rmarkdown can be called from another R script. This by default does not create a fresh environment for the script to run in, so you can use any parameters already loaded. As an example:

# Build file
library(rmarkdown)

cars2<- cars
render("RenderTest.Rmd")

I would, however, be careful doing this. Firstly, the benefit of using RMarkdown is that it makes reproducibility of the script is incredibly easy. As soon as you start using external scripts, it makes things more complicated to replicate as all the settings are not contained within the file.

Option 2: Save data to an R object

If you have some analysis which takes time to run, you can save the result of the analysis as an R object, and then you can reload the final version of the data into the session. Using my above example:

```{r dataProcess, cache = TRUE}
cars2 <- cars
save(cars2, "carsData.RData") # saves the 'cars2' dataset
```
and then we can just reload the data into the session:

```{r}
load("carsData.RData") # reloads the 'cars2' dataset
```

I prefer this technique. The chunk dataProcess is cached, so is only run if there are changes made to the code. The results are saved to file, which are then loaded by the next chunk. The data still has to be loaded into the session, but you can save the finalised dataset if you need to do any data cleaning.

Option 3: Build the file less frequently

With the updates made to RStudio over the past few years, there is less of a need to continuously rebuild the file. Chunks can be run directly within the file, and the output window viewed. It will potentially save you a lot of time trying to optimise the script, only to save a couple of minutes on compiling (which normally makes a good time to get a hot drink anyway!).

enter image description here

136

answered Oct 21 '22 08:10

Michael Harper

Related questions
                            
                                What does the Autoloads environment do?
                            
                                data.table - does setkey(...) create an index or physically reorder the rows in a data table?
                            
                                ggplot2 graph quality in shiny on shinyapps.io
                            
                                How to separate Title Page and Table of Content Page from knitr rmarkdown PDF?
                            
                                Reordering factor gives different results, depending on which packages are loaded
                            
                                Failure to connect to odbc database in R
                            
                                Displaying image on point hover in Plotly
                            
                                Observe modal (easy) closing in Shiny
                            
                                assigning by reference into loaded package datasets
                            
                                Sub-assign by reference on vector in R
                            
                                R: Filling missing dates in a time series?
                            
                                R: creating a named vector from variables
                            
                                print vs. echo in R
                            
                                Plot does not resize 100% width after show/hide sidebar in R shiny page
                            
                                R multiple conditions in if statement [duplicate]
                            
                                Creating a new regex based on the returned results and rules of a previous regex | Indexing a regex and seeing how the regex has matched a substring
                            
                                R: passing expression to an inner function
                            
                                Random Forest with classes that are very unbalanced
                            
                                Embedding ggplot2 output in LaTeX pdf using knitr and RStudio
                            
                                Rstudio-server environment variables not loading?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With