Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to produce markdown document for each row of dataframe in R

Tags:

r

knitr

I would like to produce either 1 markdown document with subdocuments from each row of a dataframe or produce nrows number of markdown documents from a dataframe. The markdown document is template.Rmd.

I think that it should work to create a for loop, but when I try to do this, by(dataFrame, 1:nrow(dataFrame), function(row) knit(file = "/Users/path/template.Rmd")) I get an error that the input ended unexpectedly.

Quitting from lines 23-26 (Preview-e0d353674d36.Rmd) 
Error in knit(file = "/Users/path/template.Rmd") : 
  unused argument (file = "/Users/path/template.Rmd")
Calls: <Anonymous> ... eval -> eval -> tapply -> lapply -> FUN -> FUN -> knit

Execution halted

I tried using the same awesome approach solved by @Yihui to programmatically reference text with knitr-expand detailed here: R knitr: Possible to programmatically modify chunk labels?

From that solution, we have two .Rmd files, My report and Template My report looks like:

# My report

```{r}
data(mtcars)
cyl.levels <- unique(mtcars$cyl)
```

## Generate report for each level of cylinder variable
```{r, include=FALSE}
src <- lapply(cyl.levels, function(ncyl) knit_expand(file = "template.Rmd"))
```

`r knit(text = unlist(src))`

Template looks like:

```{r, results='asis'}
cat("### {{ncyl}} cylinders")
```

```{r mpg-histogram-{{ncyl}}cyl}
hist(mtcars$mpg[mtcars$cyl == {{ncyl}}], 
  main = paste({{ncyl}}, "cylinders"))
```

```{r weight-histogam-{{ncyl}}cyl}
hist(mtcars$wt[mtcars$cyl == {{ncyl}}], 
  main = paste({{ncyl}}, "cylinders"))
```

This solution produces a single markdown document with a subdocument (at heading level 2) for each level of cylinder. However, I am trying to create a report that fetches a .csv and then creates and modifies a dataframe and produces content for each row of another dataframe.

What I think I am stuck on is how to use the value in {{ncyl}} to programmatically refer to rows of a database. I would like to be able to use the levels of {{ncyl}} to go and do stuff with the related rows in the dataframe mtcars (assuming that it only had rows == levels{{ncyl}} for this example).

While data(mtcars), does have more rows than levels of cylyinder, R stores the value of {{ncyl}} as an integer. So, you can call mtcars$gear[[{{ncyl}}]] and get the value of gear for the {{ncyl}} row.

Why then, when we add that into our template.Rmd, it fails?

Forgive me, it doesn't fail, it will give us gear <- mtcars$gear[[{{ncyl}}]] but we cannot then create a chunk of gear, like ```{r this-gear-{{gear}}}.

This works

```{r}
gear <- mtcars$gear[[{{ncyl}}]]
gear
```

```{r, results='asis'}
cat("### {{ncyl}} cylinders")
```

```{r mpg-histogram-{{ncyl}}cyl}
hist(mtcars$mpg[mtcars$cyl == {{ncyl}}], 
  main = paste({{ncyl}}, "cylinders"))
```

```{r weight-histogam-{{ncyl}}cyl}
hist(mtcars$wt[mtcars$cyl == {{ncyl}}], 
  main = paste({{ncyl}}, "cylinders"))
```

This does not work

```{r}
gear <- mtcars$gear[[{{ncyl}}]]
gear
```

```{r, results='asis'}
cat("### {{ncyl}} cylinders")
```

```{r mpg-histogram-{{ncyl}}cyl}
hist(mtcars$mpg[mtcars$cyl == {{ncyl}}], 
  main = paste({{ncyl}}, "cylinders"))
```

```{r weight-histogam-{{ncyl}}cyl}
hist(mtcars$wt[mtcars$cyl == {{ncyl}}], 
  main = paste({{ncyl}}, "cylinders"))
```
```{r {{gear}}}
gear
```

Giving the error

Quitting from lines 10-12 (Preview-e0d32d687661.Rmd) 
Error in eval(expr, envir, enclos) : object 'gear' not found
Calls: <Anonymous> ... knit_expand -> inline_exec -> withVisible -> eval -> eval
Execution halted

I think I am approaching the main problem "How do I create a markdown document for each row of a dataframe?" wrong with the knit-expand feature.

Can someone help me understand: 1. How to solve the main problem 2. Why the {{gear}} does not work within template.Rmd?

So, I still don't understand (2), but I think that @daroczig has gotten me close to understanding one way to solve the main problem. I don't think this is too unique of a problem, and I assume that there is a way to solve it without brew or pander or rapport. In any case, I took the brew approach and do something with a few lines of a dataframe. It throws an error. Note I am not doing anything sensible with this code, just limiting the mtcars to 3 rows so I don't get too much output, and then creating another, lame, dataframe within the for loop.

# My report

<%
mtcars1 <- mtcars[1:3,]
mtcars1$type <- c('red','blue','green')
t.levels <- unique(mtcars1$type)
for (ty in t.levels) {
p <- subset(mtcars1,type == ty) 
x <- rep(p, 4)
short <- paste0(p$gear, p$mpg)
%>

### <%= short %> blah

<%=
hist(x$mpg, main = paste(short, "blah"))
%>

<% } %>

This is just a little lame modification of the solution proposed below by @daroczig. It works if we name it demo.brew and call it from Pandoc.brew('demo.brew', output = tempfile(), convert = 'html'). Making one silly example.

(3) Is there an example of how to do this without brew? I'm curious.

Answer to (3) Yes. This works with a for loop that calls the variable instead of row num

varlist <- unique(df$variable)
for (var in varlist) {
    try(knit2html(input= '/Users/path/template.Rmd',
                  output=paste0('/Users/path/template',var,'.html'))) 

Works where the loop from 1:nrow() did not.

like image 651
jessi Avatar asked Nov 01 '22 18:11

jessi


1 Answers

An alternative solution with pander -- based on my above comment:

# My report

<%
cyl.levels <- unique(mtcars$cyl)
for (ncyl in cyl.levels) {
%>

### <%= ncyl %> cylinders

<%=
hist(mtcars$mpg[mtcars$cyl == ncyl], main = paste(ncyl, "cylinders"))
hist(mtcars$wt[mtcars$cyl == ncyl], main = paste(ncyl, "cylinders"))
%>

<% } %>

To brew this file (named as demo.brew), run:

Pandoc.brew('demo.brew')

Or to get e.g. a MS Word document:

Pandoc.brew('demo.brew', output = tempfile(), convert = 'docx')

Update: I've just realized that you need separate documents for the categories. For this end, I'd suggest my other package, rapport, a try, which focuses on exactly statistical report templates. Quick example:

<!--head
meta:
  title: Demo for @Jessi
  author: daroczig
  description: This is a demo
  packages: ~
inputs:
- name: ncyl
  class: integer
  standalone: TRUE
  required: TRUE
head-->

### <%= ncyl %> cylinders

<%=
hist(mtcars$mpg[mtcars$cyl == ncyl], main = paste(ncyl, "cylinders"))
hist(mtcars$wt[mtcars$cyl == ncyl], main = paste(ncyl, "cylinders"))
%>

So this above document (demo.rapport) is a rapport template, which has a YAML header for the metadata and inputs (which acts like parameters/arguments in R functions), then the body can include markdown and R code in brew syntax with pander. Now you can easily call this report template with a simple call, e.g. for 4 cylinders:

> rapport('demo.rapport', ncyl = 4)

### _4_ cylinders

![](plots/rapport--home-daroczig-projects-demo.rapport-6-1.png)
![](plots/rapport--home-daroczig-projects-demo.rapport-6-2.png)

And to produce a MS Word file for all cylinders, try this:

for (ncyl in (2:4)*2) {
    rapport.docx('/home/daroczig/projects/demo.rapport', ncyl = ncyl)
}
like image 153
daroczig Avatar answered Nov 09 '22 10:11

daroczig