Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speed up plot() function for large dataset

Tags:

plot

r

I am using plot() for over 1 mln data points and it turns out to be very slow.

Is there any way to improve the speed including programming and hardware solutions (more RAM, graphic card...)?

Where are data for plot stored?

like image 640
SilverSpoon Avatar asked Jun 08 '12 08:06

SilverSpoon


3 Answers

(This question is closely related to Scatterplot with too many points, although that question focuses on the difficulty of seeing anything in the big scatterplot rather than on performance issues ...)

A hexbin plot actually shows you something (unlike the scatterplot @Roland proposes in the comments, which is likely to just be a giant, slow, blob) and takes about 3.5 seconds on my machine for your example:

set.seed(101)
a<-rnorm(1E7,1,1)
b<-rnorm(1E7,1,1)
library(hexbin)
system.time(plot(hexbin(a,b)))  ## 0.5 seconds, modern laptop

enter image description here

Another, slightly slower alternative is the base-R smoothScatter function: it plots a smooth density plus as many extreme points as requested (1000 in this case).

system.time(smoothScatter(a,b,cex=4,nr=1000))  ## 3.3 seconds

enter image description here

like image 114
Ben Bolker Avatar answered Oct 19 '22 04:10

Ben Bolker


an easy and fast way is to set pch='.' . The performance is shown below

x=rnorm(10^6)
> system.time(plot(x))
  user  system elapsed 
  2.87   15.32   18.74 
> system.time(plot(x,pch=20))
  user  system elapsed 
  3.59   22.20   26.16 
> system.time(plot(x,pch='.'))
  user  system elapsed 
  1.78    2.26    4.06 
like image 19
TPArrow Avatar answered Oct 19 '22 04:10

TPArrow


have you looked at the tabplot package. it is designed specifically for large data http://cran.r-project.org/web/packages/tabplot/ I use that its faster than using hexbin (or even the default sunflower plots for overplotting)

also i think Hadley wrote something on DS 's blog modifying ggplot for big data at http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html

"""I'm currently with working another student, Yue Hu, to turn our research into a robust R package.""" October 21, 2011

Maybe we can ask Hadley if the updated ggplot3 is ready

like image 2
Ajay Ohri Avatar answered Oct 19 '22 02:10

Ajay Ohri