Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

writing to global variables in using doSNOW and doing parallelization in R?

Tags:

foreach

r

Is there a problem when accessing/writing to global variable in using doSNOW package on multiple cores?

In the below program, each of the MyCalculations(ii) writes to the ii-th column of the matrix "globalVariable"...

Do you think the result will be correct? Will there be hidden catches?

Thanks a lot!

p.s. I have to write out to the global variable because this is a simplied example, in fact I have lots of outputs that need to be transported from within the parallel loops... therefore, probably the only way is to write out to global variables...

library(doSNOW)
MaxSearchSpace=44*5
globalVariable=matrix(0, 10000, MaxSearchSpace)
cl<-makeCluster(7)
registerDoSNOW(cl)
foreach (ii = 2:nMaxSearchSpace, .combine=cbind, .verbose=F) %dopar%
  {
   MyCalculations(ii)
  }

stopCluster(cl)

p.s. I am asking - within the DoSnow framework, is there any danger of accessing/writing global variables... thx

like image 652
Luna Avatar asked Feb 22 '12 22:02

Luna


People also ask

What is Dopar R?

foreach. Description %do% and %dopar% are binary operators that operate on a foreach object and an R expression. The expression, ex, is evaluated multiple times in an environment that is created by the foreach. object, and that environment is modified for each evaluation as specified by the foreach object.

Does R have global variables?

Overview. Global variables in R are variables created outside a given function. A global variable can also be used both inside and outside a function.


1 Answers

Since this question is a couple months old, I hope you've found an answer by now. However, in case you're still interested in feedback, here's something to consider:

When using foreach with a parallel backend, you won't be able to assign to variables in R's global environment in the way you're attempting (you probably noticed this). Using a sequential backend, assignment will work, but not using a parallel one like with doSNOW.

Instead, save all the results of your calculations for each iteration in a list and return this to an object, so that you can extract the appropriate results after all calculations have been completed.

My suggestion starts similarly to your example:

library(doSNOW)
MaxSearchSpace <- 44*5
cl <- makeCluster(parallel::detectCores())

# do not create the globalVariable object

registerDoSNOW(cl)

# Save the results of the `foreach` iterations as 
# lists of lists in an object (`theRes`)

theRes <- foreach (ii = 2:MaxSearchSpace, .verbose=F) %dopar%
  {
# do some calculations
   theNorms <- rnorm(10000)
   thePois <- rpois(10000, 2)
# store the results in a list
   list(theNorms, thePois)
  }

After all iterations have been completed, extract the results from theRes and store them as objects (e.g., globalVariable, globalVariable2, etc.)

globalVariable1 <- do.call(cbind, lapply(theRes, "[[", 1))
globalVariable2 <- do.call(cbind, lapply(theRes, "[[", 2))

With this in mind, if you are performing calculations with each iteration that are dependent on the results of calculations from previous iterations, then this type of parallel computing is not the approach to take.

like image 172
BenBarnes Avatar answered Oct 07 '22 22:10

BenBarnes