I have a large matrix:
set.seed(1)
a <- matrix(runif(9e+07),ncol=300)
I want to sort each row in the matrix:
> system.time(sorted <- t(apply(a,1,sort)))
user system elapsed
42.48 3.40 45.88
I have a lot of RAM to work with, but I would like a faster way to perform this operation.
Well, I'm not aware of that many ways to sort faster in R, and the problem is that you're only sorting 300 values, but many times. Still, you can eek some extra performance out of sort by directly calling sort.int
and using method='quick'
:
set.seed(1)
a <- matrix(runif(9e+07),ncol=300)
# Your original code
system.time(sorted <- t(apply(a,1,sort))) # 31 secs
# sort.int with method='quick'
system.time(sorted2 <- t(apply(a,1,sort.int, method='quick'))) # 27 secs
# using a for-loop is slightly faster than apply (and avoids transpose):
system.time({sorted3 <- a; for(i in seq_len(nrow(a))) sorted3[i,] <- sort.int(a[i,], method='quick') }) # 26 secs
But a better way should be to use the parallel package to sort parts of the matrix in parallel. However, the overhead of transferring data seems to be too big, and on my machine it starts swapping since I "only" have 8 GB memory:
library(parallel)
cl <- makeCluster(4)
system.time(sorted4 <- t(parApply(cl,a,1,sort.int, method='quick'))) # Forever...
stopCluster(cl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With