Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to speed up this R code

Tags:

r

plyr

I have a data.frame (link to file) with 18 columns and 11520 rows that I transform like this:

library(plyr)
df.median<-ddply(data, .(groupname,starttime,fPhase,fCycle), 
                 numcolwise(median), na.rm=TRUE)

according to system.time(), it takes about this long to run:

   user  system elapsed 
   5.16    0.00    5.17

This call is part of a webapp, so run time is pretty important. Is there a way to speed this call up?

like image 212
dnagirl Avatar asked Oct 19 '10 18:10

dnagirl


People also ask

Why is my R code so slow?

Beyond performance limitations due to design and implementation, it has to be said that a lot of R code is slow simply because it's poorly written. Few R users have any formal training in programming or software development.

Which function is faster in terms of performance in R?

table function is fastest, followed by the tidyverse version and then the base R function. By calculating the relative speeds, we can see that compared to the data. table function, the base R function is almost 4 times and the dplyr function is 3 times slower!

Is R faster than Rstudio?

I just noticed that the same codes run much faster in R launched from the server's terminal than in Rstudio Server, and the difference is quite significant. For example, I wrote a heavy R script that handles a large amount of data (>10GB).


1 Answers

Just using aggregate is quite a bit faster...

> groupVars <- c("groupname","starttime","fPhase","fCycle")
> dataVars <- colnames(data)[ !(colnames(data) %in% c("location",groupVars)) ]
> 
> system.time(ag.median <- aggregate(data[,dataVars], data[,groupVars], median))
   user  system elapsed 
   1.89    0.00    1.89 
> system.time(df.median <- ddply(data, .(groupname,starttime,fPhase,fCycle), numcolwise(median), na.rm=TRUE))
   user  system elapsed 
   5.06    0.00    5.06 
> 
> ag.median <- ag.median[ do.call(order, ag.median[,groupVars]), colnames(df.median)]
> rownames(ag.median) <- 1:NROW(ag.median)
> 
> identical(ag.median, df.median)
[1] TRUE
like image 142
Joshua Ulrich Avatar answered Sep 21 '22 06:09

Joshua Ulrich