I am trying to get the foreach package for parallel processing in R working and I am having a couple of issues:
The doMC package that is required to make foreach work does not exist on CRAN for Windows. Some blogs suggest that doSNOW instead should do the same job. However, when I run the foreach command with doSNOW, %dopar%
does not seem to work faster than %do%
. In fact it is much slower. My CPU is an Intel i7 860 @ 2.80GHz with 8 GB of RAM. Below is my code:
##Run example in 1 core
require(foreach)
require(doSNOW)
x= iris[which(iris[,5] != "setosa"),c(1,5)]
trials = 10000
system.time({
r= foreach(icount(trials), .combine=cbind) %do% {
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
}
})[3]
# elapsed
# 37.28
# Same example in 2 cores
registerDoSNOW(makeCluster(2,type="SOCK"))
getDoParWorkers()
trials = 10000
system.time({
r= foreach(icount(trials), .combine=cbind) %dopar% {
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
}
})[3]
# elapsed
# 108.14
I re-installed all the packages required but still the same problems. Here is the output:
sessionInfo()
#R version 2.15.1 (2012-06-22)
#Platform: i386-pc-mingw32/i386 (32-bit)
#locale:
#[1] LC_COLLATE=English_United States.1252
#[2] LC_CTYPE=English_United States.1252
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C
#[5] LC_TIME=English_United States.1252
#attached base packages:
#[1] parallel stats graphics grDevices datasets utils methods
#[8] base
#other attached packages:
#[1] doParallel_1.0.1 codetools_0.2-8 doSNOW_1.0.6 snow_0.3-10
#[5] iterators_1.0.6 foreach_1.4.0 rcom_2.2-5 rscproxy_2.0-5
#loaded via a namespace (and not attached):
#[1] compiler_2.15.1 tools_2.15.1
You are better off in Windows to use doParallel()
:
require(foreach)
require(doParallel)
cl <- makeCluster(6) #use 6 cores, ie for an 8-core machine
registerDoParallel(cl)
Then run your foreach() %dopar% {}
EDIT: OP mentioned still seeing the problem, so including my exact code. Running on a 4-core Windows7 VM, R 2.15.1 32-bit, only allowing doParallel
to use 3 of my cores:
require(foreach)
require(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)
x= iris[which(iris[,5] != "setosa"),c(1,5)]
trials = 1000
system.time(
foreach(icount(trials), .combine=cbind) %do%
{
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
})[3]
system.time(
foreach(icount(trials), .combine=cbind) %dopar%
{
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
})[3]
In my case, I'm getting 17.6 sec for %do%
and 14.8 sec for %dopar%
. Watching the tasks execute, it appears that much of the execution time is the cbind
, which is a common issue running parallel. In my own simulations, I have done custom work to save my detailed results as part of the parallel task rather than returning them through foreach
, to remove that part of the overhead. YMMV.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With