I'm trying to plot some million data points in R. I'm currently using ggplot2 (but I'm open to suggestions of alternate packages). The problem is that the graph takes too long to render (often upwards of a minute). I'm looking for ways to do this faster -- in real time ideally. I would appreciate any help -- attaching code to the question for clarity. Creating a (random) data frame with ~500000 data points: <pre class="prettyprint"><code>letters <- c("A", "B", "C", "D", "E", "F", "G") myLetters <- sample(x = letters, size = 100000, replace = T) direction <- c("x", "y", "z") factor1 <- sample(x = direction, size = 100000, replace = T) factor2 <- runif(100000, 0, 20) factor3 <- runif(100000, 0, 100) decile <- sample(x = 1:10, size = 100000, replace = T) new.plot.df <- data.frame(letters = myLetters, factor1 = factor1, factor2 = factor2, factor3 = factor3, decile = decile) </code></pre> Now, plotting the data: <pre class="prettyprint"><code>color.plot <- ggplot(new.plot.df, aes(x = factor3, y = factor2, color = factor1)) + geom_point(aes(alpha = factor2)) + facet_grid(decile ~ letters) </code></pre> <img src="https://i.stack.imgur.com/WGd1L.png" alt="enter image description here"> How do I make the rendering faster?

There are two main sources of slowness in R plotting: <ol> <li>graphics device and backend in general</li> <li>plotting too much of complicated shapes</li> </ol> Graphical back-end can be altered using appropriate device-opening and backend-selection commands -- for me, this usually helps: <pre class="prettyprint"><code>options(bitmapType='cairo') #set the drawing backend, this may speed up PNG rendering x11(type='cairo') #drawing to X11 window using cairo is the fastest interactive output for me </code></pre> (X11 is not available on windows and a little confusing in Rstudio, but that's a different story) Plotting simpler shapes helps quite a lot. ggplot uses some variant of <code>pch=19</code> or <code>pch=20</code> by default, which are way too slow because of anti-aliasing. You can usually get about 10x faster rendering by using <code>pch='.'</code> (which is just a single non-aliased pixel) or <code>pch=16</code> (which is a small non-aliased circle). That also applies for ggplot with <code>shape='.'</code> and <code>shape=16</code>, respectively. If you have a lot of points and set appropriately lower alpha, you'll get the "anti-aliasing" for free. For me, just switching the graphical backend and setting different point shape improved drawing of 1 million points from around 30 minutes to seconds. 500k data points should be rendered in under a second. EDIT (Jan 2020): I recently made a library that speeds this up even more: https://github.com/exaexa/scattermore

In general there are two strategies that I use for this: 1) As described in the comments, taking a reasonable descriptive sample of your data is not going to affect your plot and you will reduce the number of points to render. 2) One trick that I use is actually to create the object without displaying the plot and instead save the plot into a PNG image. This actually speeds up the process by a lot because when you open the image it's going to be a raster rather than a vectorial image.

Efficiently plotting millions of data points in R

Tags:

plot

r

ggplot2

I'm trying to plot some million data points in R. I'm currently using ggplot2 (but I'm open to suggestions of alternate packages). The problem is that the graph takes too long to render (often upwards of a minute). I'm looking for ways to do this faster -- in real time ideally. I would appreciate any help -- attaching code to the question for clarity.

Creating a (random) data frame with ~500000 data points:

letters <- c("A", "B", "C", "D", "E", "F", "G")
myLetters <- sample(x = letters, size = 100000, replace = T)
direction <- c("x", "y", "z")
factor1 <- sample(x = direction, size = 100000, replace = T)
factor2 <- runif(100000, 0, 20)
factor3 <- runif(100000, 0, 100)
decile <- sample(x = 1:10, size = 100000, replace = T)


new.plot.df <- data.frame(letters = myLetters, factor1 = factor1, factor2 = factor2, 
                      factor3 = factor3, decile = decile)

Now, plotting the data:

color.plot <- ggplot(new.plot.df, aes(x = factor3, y = factor2, color = factor1)) +
geom_point(aes(alpha = factor2)) +
facet_grid(decile ~ letters)

enter image description here

How do I make the rendering faster?

409

asked Jan 20 '16 14:01

Karan Tibrewal

2 Answers

There are two main sources of slowness in R plotting:

graphics device and backend in general
plotting too much of complicated shapes

Graphical back-end can be altered using appropriate device-opening and backend-selection commands -- for me, this usually helps:

options(bitmapType='cairo')  #set the drawing backend, this may speed up PNG rendering
x11(type='cairo')   #drawing to X11 window using cairo is the fastest interactive output for me

(X11 is not available on windows and a little confusing in Rstudio, but that's a different story)

Plotting simpler shapes helps quite a lot. ggplot uses some variant of pch=19 or pch=20 by default, which are way too slow because of anti-aliasing. You can usually get about 10x faster rendering by using pch='.' (which is just a single non-aliased pixel) or pch=16 (which is a small non-aliased circle). That also applies for ggplot with shape='.' and shape=16, respectively. If you have a lot of points and set appropriately lower alpha, you'll get the "anti-aliasing" for free.

For me, just switching the graphical backend and setting different point shape improved drawing of 1 million points from around 30 minutes to seconds. 500k data points should be rendered in under a second.

EDIT (Jan 2020): I recently made a library that speeds this up even more: https://github.com/exaexa/scattermore

143

answered Oct 17 '22 06:10

exa

In general there are two strategies that I use for this:

1) As described in the comments, taking a reasonable descriptive sample of your data is not going to affect your plot and you will reduce the number of points to render.

2) One trick that I use is actually to create the object without displaying the plot and instead save the plot into a PNG image. This actually speeds up the process by a lot because when you open the image it's going to be a raster rather than a vectorial image.

answered Oct 17 '22 05:10

nbafrank

Related questions
                            
                                Reverse the scale of the x axis in a plot
                            
                                Remove grey background confidence interval from forecasting plot
                            
                                Reverse datetime (POSIXct data) axis in ggplot
                            
                                How to combine ggplot and dplyr into a function?
                            
                                R How to read a file from google drive using R
                            
                                Why should someone use {} for initializing an empty object in R?
                            
                                How to find the border points of a particular shape
                            
                                How to merge colour and shape?
                            
                                constrained optimization in R
                            
                                How do I plot the first derivative of the smoothing function?
                            
                                facet_wrap fill by column
                            
                                Select along one of n dimensions in array
                            
                                Fill superimposed ellipses in ggplot2 scatterplots
                            
                                How to convert a sparse matrix into a matrix of index and value of non-zero element
                            
                                R: sparse matrix conversion
                            
                                Why can 'hallo\nworld' match both \n and \\n in R?
                            
                                Approaches for spatial geodesic latitude longitude clustering in R with geodesic or great circle distances
                            
                                Is there a way to delete all comments in a R script using RStudio?
                            
                                R-Project: xlsx package installation failure (due to java issues)
                            
                                devtools::install_github fails with CA cert error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With