I'm trying to process a bunch of csv files and return data frames in R, in parallel using mclapply()
. I have a 64 core machine, and I can't seem to get anymore that 1 core utilized at the moment using mclapply()
. In fact, it is a bit quicker to run lapply()
rather than mclapply()
at the moment. Here is an example that shows that mclapply() is not utilizing more the cores available:
library(parallel)
test <- lapply(1:100,function(x) rnorm(10000))
system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
system.time(x <- mclapply(test,function(x) loess.smooth(x,x), mc.cores=32))
user system elapsed
0.000 0.000 7.234
user system elapsed
0.000 0.000 8.612
Is there some trick to getting this working? I had to compile R from source on this machine (v3.0.1), are there some compile flags that I missed to allow forking? detectCores()
tells me that I indeed do have 64 cores to play with...
Any tips appreciated!
I get similar results to you, but if I change rnorm(10000)
to rnorm(100000)
, I get significant speed up. I would guess that the additional overhead is canceling out any performance benefit for such a small scale problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With