Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to sort each row of a large matrix in R

I have a large matrix:

set.seed(1)
a <- matrix(runif(9e+07),ncol=300)

I want to sort each row in the matrix:

> system.time(sorted <- t(apply(a,1,sort)))
   user  system elapsed 
  42.48    3.40   45.88 

I have a lot of RAM to work with, but I would like a faster way to perform this operation.

like image 766
Zach Avatar asked Feb 29 '12 20:02

Zach


1 Answers

Well, I'm not aware of that many ways to sort faster in R, and the problem is that you're only sorting 300 values, but many times. Still, you can eek some extra performance out of sort by directly calling sort.int and using method='quick':

set.seed(1)
a <- matrix(runif(9e+07),ncol=300)

# Your original code
system.time(sorted <- t(apply(a,1,sort))) # 31 secs

# sort.int with method='quick'
system.time(sorted2 <- t(apply(a,1,sort.int, method='quick'))) # 27 secs

# using a for-loop is slightly faster than apply (and avoids transpose):
system.time({sorted3 <- a; for(i in seq_len(nrow(a))) sorted3[i,] <- sort.int(a[i,], method='quick') }) # 26 secs

But a better way should be to use the parallel package to sort parts of the matrix in parallel. However, the overhead of transferring data seems to be too big, and on my machine it starts swapping since I "only" have 8 GB memory:

library(parallel)
cl <- makeCluster(4)
system.time(sorted4 <- t(parApply(cl,a,1,sort.int, method='quick'))) # Forever...
stopCluster(cl)
like image 124
Tommy Avatar answered Oct 08 '22 04:10

Tommy