I am creating a document using knitr and I am finding it tedious to reload the data from disk every time I parse the document while I'm in development. I've subsetted that datafile for development to shorten the load time. I also have knitr cache set to on.
I tried assigning the data to the global environment using <<-
, and using exists
with where=globalenv()
, but that did not work.
Anyone know how to use preloaded data from the environment in knitr or have other ideas to speed up development?
If you don't want any code chunks to run you can add eval = FALSE in your setup chunk with knitr::opts_chunk$set() . If you want only some chunks to run you can add eval = FALSE to only the chunk headers of those you don't want to run.
For example, when you library(tidyverse) or library(ggplot2) , you may see some loading messages. Such messages can also be suppressed by the chunk option message = FALSE .
knitr is an engine for dynamic report generation with R. It is a package in the programming language R that enables integration of R code into LaTeX, LyX, HTML, Markdown, AsciiDoc, and reStructuredText documents. The purpose of knitr is to allow reproducible research in R through the means of literate programming.
When a document is knitted, a new environment is created within R, and therefore any settings in the global environment will not be passed to the document. However, this is done intentionally, as accidentally referencing an object in the global environment is an easy thing to break a reproducible analysis, and therefore making a clean session each time means the RMarkdown file runs on its own, regardless of the global environment settings.
If you do have a use case which justifies preloading the data, there are a few things you can do.
Firstly I have created a minimal Rmd file as below called "RenderTest.Rmd":
title: "Render"
author: "Michael Harper"
date: "7 November 2017"
output: pdf_document
---
```{r cars}
summary(cars2)
```
In this example, cars2
is a set of data I am referencing to from my global session. Run on its using the "Knit" command in RStudio, this will return the following error:
Error in summary(cars): object 'cars2' not found: ... withCallignHandlers -> withVisible -> eval -> eval -> summary Execution halted
The render
function from rmarkdown
can be called from another R script. This by default does not create a fresh environment for the script to run in, so you can use any parameters already loaded. As an example:
# Build file
library(rmarkdown)
cars2<- cars
render("RenderTest.Rmd")
I would, however, be careful doing this. Firstly, the benefit of using RMarkdown is that it makes reproducibility of the script is incredibly easy. As soon as you start using external scripts, it makes things more complicated to replicate as all the settings are not contained within the file.
If you have some analysis which takes time to run, you can save the result of the analysis as an R object, and then you can reload the final version of the data into the session. Using my above example:
```{r dataProcess, cache = TRUE}
cars2 <- cars
save(cars2, "carsData.RData") # saves the 'cars2' dataset
```
and then we can just reload the data into the session:
```{r}
load("carsData.RData") # reloads the 'cars2' dataset
```
I prefer this technique. The chunk dataProcess
is cached, so is only run if there are changes made to the code. The results are saved to file, which are then loaded by the next chunk. The data still has to be loaded into the session, but you can save the finalised dataset if you need to do any data cleaning.
With the updates made to RStudio over the past few years, there is less of a need to continuously rebuild the file. Chunks can be run directly within the file, and the output window viewed. It will potentially save you a lot of time trying to optimise the script, only to save a couple of minutes on compiling (which normally makes a good time to get a hot drink anyway!).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With