Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nested foreach loops in R to update common array

I am trying to use a couple of foreach loops in R to fill out a common array in parallel. A very simplified version of what I am trying to do is:

library(foreach)
set.seed(123)
x <- matrix(NA, nrow = 8, ncol = 2)

foreach(i=1:8) %dopar% {
    foreach(j=1:2) %do% {

      l <- runif(1, i, 100)
      x[i,j] <- i + j + l     #This is much more complicated in my real code.   

    }
}

I would like to code to update the matrix x in parallel and have the output look like:

> x
       [,1]      [,2]
 [1,]  31.47017  82.04221
 [2,]  45.07974  92.53571
 [3,]  98.22533  12.41898
 [4,]  59.69813  95.67223
 [5,]  63.38633  55.37840
 [6,] 102.94233  56.61341
 [7,]  78.01407  69.25491
 [8,]  26.46907 100.78390 

However, I cannot seem to figure out how to get the array to be updated. I have tried putting the x <- elsewhere, but it doesn't seem to like it. I think this will be a very easy thing to fix, but all my searching has not lead me there yet. Thanks.

like image 709
joshdr83 Avatar asked Jul 07 '13 00:07

joshdr83


2 Answers

foreach loops are used for their return value, like lapply. In this way they are very different from for loops which are used for their side effects. By using the appropriate .combine functions, the inner foreach loop can return vectors which are combined row-wise into a matrix by the outer foreach loop:

x <- foreach(i=1:8, .combine='rbind') %dopar% {
   foreach(j=1:2, .combine='c') %do% {
     l <- runif(1, i, 100)
     i + j + l  
   }
}

You can also use the nesting operator: %:%:

x <- foreach(i=1:8, .combine='rbind') %:%
   foreach(j=1:2, .combine='c') %dopar% {
     l <- runif(1, i, 100)
     i + j + l  
   }

Note that set.seed probably won't do what you want, since it is being performed on the local machine, while the random numbers are generated in different R sessions, possibly on different machines.

like image 145
Steve Weston Avatar answered Oct 23 '22 05:10

Steve Weston


Just to add something to Steve's answer: I think the crucial point is that the parallel backend starts multiple Rscript.exe processes (as can be seen in the task manager). Certain objects that are used within foreach, i.e. in your case x, are then copied into the memory that was allocated for each of these processes. I am not sure how the copying is handled in the foreach package, but with the *ply functions of the plyr package you have to explicitly state the objects that should be copied. The different processes do not share their memory. (I am not aware of other R packages that can use shared memory...)

One can demonstrate that the matrix x is actually copied by using .Internal(inspect(x)) to print object x's memory location.

library(foreach)
library(doParallel)

x <- matrix(1:16, nrow = 8, ncol = 2)
#print memory location of x
capture.output(.Internal(inspect(x)))[1]

#create parallel backend; in our case two Rscript.exe processes
workers=makeCluster(2)
registerDoParallel(workers)

y<- foreach(i=1:8, .combine='rbind') %dopar% {
    #return memory location of x
    capture.output(.Internal(inspect(x)))[1]
}

#print matrix y
#there should be two different memory locations - 
#according to the two Rscript.exe processes started above
y

#close parallel backend
stopCluster(workers)

The matrix y reads

       [,1]                                                                          
result.1 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(1),ATT] (len=16, tl=0) 1,2,3,4,5,..."
result.2 "@0x0000000003dab9b0 13 INTSXP g0c5 [NAM(1),ATT] (len=16, tl=0) 1,2,3,4,5,..."
result.3 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(2),ATT] (len=16, tl=0) 1,2,3,4,5,..."
result.4 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(2),ATT] (len=16, tl=0) 1,2,3,4,5,..."
...

You should find two different memory addresses there.

like image 42
cryo111 Avatar answered Oct 23 '22 07:10

cryo111