I'm experiencing a weird behaviour in my computer when distributing processes among its cores using doMC and foreach. Does someone knows why using single core I got better performance than using 2 cores? As you can see, processing the same code without register any core (which supposedly use only 1 core) yields to a much more time-efficiency processing. While %do% seems to perform better than %dopar%, registering 2 cores out of 4 yield to more time consuming.
require(foreach)
require(doMC)
# 1-core
> system.time(m <- foreach(i=1:100) %dopar%
+ matrix(rnorm(1000*1000), ncol=5000) )
user system elapsed
9.285 1.895 11.083
> system.time(m <- foreach(i=1:100) %do%
+ matrix(rnorm(1000*1000), ncol=5000) )
user system elapsed
9.139 1.879 10.979
# 2-core
> registerDoMC(cores=2)
> system.time(m <- foreach(i=1:100) %dopar%
+ matrix(rnorm(1000*1000), ncol=5000) )
user system elapsed
3.322 3.737 132.027
> system.time(m <- foreach(i=1:100) %do%
+ matrix(rnorm(1000*1000), ncol=5000) )
user system elapsed
9.744 2.054 11.740
Using 4 cores in few trials yield to very different outcomes:
> registerDoMC(cores=4)
> system.time(m <- foreach(i=1:100) %dopar%
{ matrix(rnorm(1000*1000), ncol=5000) } )
user system elapsed
11.522 4.082 24.444
> system.time(m <- foreach(i=1:100) %dopar%
{ matrix(rnorm(1000*1000), ncol=5000) } )
user system elapsed
21.388 6.299 25.437
> system.time(m <- foreach(i=1:100) %dopar%
{ matrix(rnorm(1000*1000), ncol=5000) } )
user system elapsed
17.439 5.250 9.300
> system.time(m <- foreach(i=1:100) %dopar%
{ matrix(rnorm(1000*1000), ncol=5000) } )
user system elapsed
17.480 5.264 9.170
It's the combination of results that eats all the processing time. These are the timings on my machine for the cores=2
scenario if no results are returned. It's essentially the same code, only the created matrices are discarded instead of being returned:
> system.time(m <- foreach(i=1:100) %do%
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
user system elapsed
13.793 0.376 14.197
> system.time(m <- foreach(i=1:100) %dopar%
+ { matrix(rnorm(1000*1000), ncol=5000); NULL } )
user system elapsed
8.057 5.236 9.970
Still not optimal, but at least the parallel version is now faster.
This is from documentation of doMC
:
The
doMC
package provides a parallel backend for theforeach
/%dopar%
function using the multicore functionality of theparallel
package.
Now, parallel
uses a fork
mechanism to spawn identical copies of the R process. Collecting results from separate processes is an expensive task, and this is what you see in your time measurements.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With