Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

run a for loop in parallel in R

I have a for loop that is something like this:

for (i=1:150000) {    tempMatrix = {}    tempMatrix = functionThatDoesSomething() #calling a function    finalMatrix =  cbind(finalMatrix, tempMatrix)  } 

Could you tell me how to make this parallel ?

I tried this based on an example online, but am not sure if the syntax is correct. It also didn't increase the speed much.

finalMatrix = foreach(i=1:150000, .combine=cbind) %dopar%  {    tempMatrix = {}    tempMatrix = functionThatDoesSomething() #calling a function     cbind(finalMatrix, tempMatrix)  } 
like image 294
kay Avatar asked Jul 12 '16 00:07

kay


People also ask

Can FOR loops be parallel?

Can any for loop be made parallel? No, not any loop can be made parallel. Iterations of the loop must be independent from each other. That is, one cpu core should be able to run one iteration without any side effects to another cpu core running a different iteration.

Do R parallel?

The parallel package from R 2.14. 0 and later provides functions for parallel execution of R code on machines with multiple cores or processors or multiple computers. It is essentially a blend of the snow and multicore packages. By default, the doParallel package uses snow-like function- ality.

Is Lapply parallel?

lapply-based parallelism may be the most intuitively familiar way to parallelize tasks in R because it extend R's prolific lapply function.


1 Answers

Thanks for your feedback. I did look up parallel after I posted this question.

Finally after a few tries, I got it running. I have added the code below in case it is useful to others

library(foreach) library(doParallel)  #setup parallel backend to use many processors cores=detectCores() cl <- makeCluster(cores[1]-1) #not to overload your computer registerDoParallel(cl)  finalMatrix <- foreach(i=1:150000, .combine=cbind) %dopar% {    tempMatrix = functionThatDoesSomething() #calling a function    #do other things if you want     tempMatrix #Equivalent to finalMatrix = cbind(finalMatrix, tempMatrix) } #stop cluster stopCluster(cl) 

Note - I must add a note that if the user allocates too many processes, then user may get this error: Error in serialize(data, node$con) : error writing to connection

Note - If .combine in the foreach statement is rbind , then the final object returned would have been created by appending output of each loop row-wise.

Hope this is useful for folks trying out parallel processing in R for the first time like me.

References: http://www.r-bloggers.com/parallel-r-loops-for-windows-and-linux/ https://beckmw.wordpress.com/2014/01/21/a-brief-foray-into-parallel-processing-with-r/

like image 106
kay Avatar answered Oct 03 '22 22:10

kay