I am using plot()
for over 1 mln data points and it turns out to be very slow.
Is there any way to improve the speed including programming and hardware solutions (more RAM, graphic card...)?
Where are data for plot stored?
(This question is closely related to Scatterplot with too many points, although that question focuses on the difficulty of seeing anything in the big scatterplot rather than on performance issues ...)
A hexbin plot actually shows you something (unlike the scatterplot @Roland proposes in the comments, which is likely to just be a giant, slow, blob) and takes about 3.5 seconds on my machine for your example:
set.seed(101)
a<-rnorm(1E7,1,1)
b<-rnorm(1E7,1,1)
library(hexbin)
system.time(plot(hexbin(a,b))) ## 0.5 seconds, modern laptop
Another, slightly slower alternative is the base-R smoothScatter
function: it plots a smooth density plus as many extreme points as requested (1000 in this case).
system.time(smoothScatter(a,b,cex=4,nr=1000)) ## 3.3 seconds
an easy and fast way is to set pch='.'
. The performance is shown below
x=rnorm(10^6)
> system.time(plot(x))
user system elapsed
2.87 15.32 18.74
> system.time(plot(x,pch=20))
user system elapsed
3.59 22.20 26.16
> system.time(plot(x,pch='.'))
user system elapsed
1.78 2.26 4.06
have you looked at the tabplot package. it is designed specifically for large data http://cran.r-project.org/web/packages/tabplot/ I use that its faster than using hexbin (or even the default sunflower plots for overplotting)
also i think Hadley wrote something on DS 's blog modifying ggplot for big data at http://blog.revolutionanalytics.com/2011/10/ggplot2-for-big-data.html
"""I'm currently with working another student, Yue Hu, to turn our research into a robust R package.""" October 21, 2011
Maybe we can ask Hadley if the updated ggplot3 is ready
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With