Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

doMC in R and foreach loop not working

I am trying to get the foreach package for parallel processing in R working and I am having a couple of issues:

The doMC package that is required to make foreach work does not exist on CRAN for Windows. Some blogs suggest that doSNOW instead should do the same job. However, when I run the foreach command with doSNOW, %dopar% does not seem to work faster than %do%. In fact it is much slower. My CPU is an Intel i7 860 @ 2.80GHz with 8 GB of RAM. Below is my code:

##Run example in 1 core 
require(foreach)
require(doSNOW)
x= iris[which(iris[,5] != "setosa"),c(1,5)]
trials = 10000
system.time({
r= foreach(icount(trials), .combine=cbind) %do% {
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
}
})[3]
#  elapsed 
#  37.28 

# Same example in 2 cores
registerDoSNOW(makeCluster(2,type="SOCK"))
getDoParWorkers()
trials = 10000
system.time({
r= foreach(icount(trials), .combine=cbind) %dopar% {
ind=sample(100,100,replace=TRUE)
results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit))
coefficients(results1)
}
})[3]
# elapsed 
#  108.14 

I re-installed all the packages required but still the same problems. Here is the output:

sessionInfo()

#R version 2.15.1 (2012-06-22) 
#Platform: i386-pc-mingw32/i386 (32-bit)

#locale:
#[1] LC_COLLATE=English_United States.1252 
#[2] LC_CTYPE=English_United States.1252   
#[3] LC_MONETARY=English_United States.1252
#[4] LC_NUMERIC=C                          
#[5] LC_TIME=English_United States.1252    

#attached base packages:
#[1] parallel  stats     graphics  grDevices datasets  utils     methods  
#[8] base     

#other attached packages:
#[1] doParallel_1.0.1 codetools_0.2-8  doSNOW_1.0.6     snow_0.3-10     
#[5] iterators_1.0.6  foreach_1.4.0    rcom_2.2-5       rscproxy_2.0-5  

#loaded via a namespace (and not attached):
#[1] compiler_2.15.1 tools_2.15.1   
like image 380
Stefanos Poulis Avatar asked Jan 17 '23 00:01

Stefanos Poulis


1 Answers

You are better off in Windows to use doParallel():

require(foreach)
require(doParallel)
cl <- makeCluster(6) #use 6 cores, ie for an 8-core machine
registerDoParallel(cl)

Then run your foreach() %dopar% {}

EDIT: OP mentioned still seeing the problem, so including my exact code. Running on a 4-core Windows7 VM, R 2.15.1 32-bit, only allowing doParallel to use 3 of my cores:

require(foreach)
require(doParallel)
cl <- makeCluster(3)
registerDoParallel(cl)

x= iris[which(iris[,5] != "setosa"),c(1,5)]

trials = 1000 
system.time( 
  foreach(icount(trials), .combine=cbind) %do% 
  {  
    ind=sample(100,100,replace=TRUE) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    coefficients(results1) 
  })[3] 

system.time( 
  foreach(icount(trials), .combine=cbind) %dopar% 
  {  
    ind=sample(100,100,replace=TRUE) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    results1 = glm(x[ind,2]~x[ind,1],family=binomial(logit)) 
    coefficients(results1) 
  })[3] 

In my case, I'm getting 17.6 sec for %do% and 14.8 sec for %dopar%. Watching the tasks execute, it appears that much of the execution time is the cbind, which is a common issue running parallel. In my own simulations, I have done custom work to save my detailed results as part of the parallel task rather than returning them through foreach, to remove that part of the overhead. YMMV.

like image 194
khoxsey Avatar answered Jan 28 '23 15:01

khoxsey