Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

knitr/rmarkdown - reducing html file size

I want to produce an html document using knitr/rmarkdown. Currently, the file is over 20MB and I'm trying to find a way to reduce it. The large file size is probably due to my plots which have a lot of points in them.

If I change my output type to pdf, I can get it down to 1.7MB. I'm wondering if there is a way to reduce my file while keeping it as a html.

EDIT: Here's a minimal working example which I did in RStduio.

---
title: "Untitled"
author: "My Name"
date: "September 7, 2015"
output: html_document
---

```{r}
library(ggplot2)
knitr::opts_chunk$set(dev='svg')
```

```{r}
set.seed(1)
mydf <- data.frame(x=rnorm(2e4),y=rnorm(2e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```

I also noticed that if I have too many observations, the plot doesn't get generated at all. I just get an empty box with a question mark in the output.

```{r}
set.seed(2)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
# ...plot doesn't appear in output

```

like image 208
Maria Reyes Avatar asked Sep 06 '15 22:09

Maria Reyes


1 Answers

Following the suggestion of @daroczig to use the "dpi" knitr chunk option, I modified your code as follows (see below).

  • You had set the dev chunk option equal to "svg", which produces very large vector graphics files, especially for images made up of many elements (points, lines, etc.)
  • I set the dev chunk option back equal to "png", which is the default raster graphics format for HTML output. So you don't need to touch it at all. Keeping the dev chunk option equal to "png" dramatically reduces the HTML output file size.
  • I set the dpi chunk option equal to 36 (72 is the default), to lower the image resolution, and decrease the HTML output file size further.
  • I set the out.width and out.height chunk options equal to "600px", to increase the image dimensions.
  • You can change the dpi, out.width, and out.height options, until you get the HTML output file size and the image dimension to what you want. There's a trade-off between output file size and image resolution.

After knitting the code, I got an HTML output file size equal to 653kB, even when plotting 5e4 data points.

---
title: "Change size of output HTML file by reducing resolution of plot image"
author: "My Name"
date: "September 7, 2015"
output: html_document
---

```{r}
# load ggplot2 silently
suppressWarnings(library(ggplot2))
# chunk option dev="svg" produces very large vector graphics files
knitr::opts_chunk$set(dev="svg")
# chunk option dev="png" is the default raster graphics format for HTML output
knitr::opts_chunk$set(dev="png")
```

```{r, dpi=36, out.width="600px", out.height="600px"}
# chunk option dpi=72 is the default resolution
set.seed(1)
mydf <- data.frame(x=rnorm(5e4),y=rnorm(5e4))
ggplot(mydf, aes(x,y)) + geom_point(alpha=0.6)
```
like image 119
algoquant Avatar answered Sep 24 '22 08:09

algoquant