I am trying to use a couple of foreach loops in R to fill out a common array in parallel. A very simplified version of what I am trying to do is:
library(foreach)
set.seed(123)
x <- matrix(NA, nrow = 8, ncol = 2)
foreach(i=1:8) %dopar% {
foreach(j=1:2) %do% {
l <- runif(1, i, 100)
x[i,j] <- i + j + l #This is much more complicated in my real code.
}
}
I would like to code to update the matrix x
in parallel and have the output look like:
> x
[,1] [,2]
[1,] 31.47017 82.04221
[2,] 45.07974 92.53571
[3,] 98.22533 12.41898
[4,] 59.69813 95.67223
[5,] 63.38633 55.37840
[6,] 102.94233 56.61341
[7,] 78.01407 69.25491
[8,] 26.46907 100.78390
However, I cannot seem to figure out how to get the array to be updated. I have tried putting the x <-
elsewhere, but it doesn't seem to like it. I think this will be a very easy thing to fix, but all my searching has not lead me there yet. Thanks.
foreach
loops are used for their return value, like lapply
. In this way they are very different from for
loops which are used for their side effects. By using the appropriate .combine
functions, the inner foreach
loop can return vectors which are combined row-wise into a matrix by the outer foreach
loop:
x <- foreach(i=1:8, .combine='rbind') %dopar% {
foreach(j=1:2, .combine='c') %do% {
l <- runif(1, i, 100)
i + j + l
}
}
You can also use the nesting operator: %:%
:
x <- foreach(i=1:8, .combine='rbind') %:%
foreach(j=1:2, .combine='c') %dopar% {
l <- runif(1, i, 100)
i + j + l
}
Note that set.seed
probably won't do what you want, since it is being performed on the local machine, while the random numbers are generated in different R sessions, possibly on different machines.
Just to add something to Steve's answer: I think the crucial point is that the parallel backend starts multiple Rscript.exe processes (as can be seen in the task manager).
Certain objects that are used within foreach
, i.e. in your case x
, are then copied into the memory that was allocated for each of these processes. I am not sure how the copying is handled in the foreach
package, but with the *ply
functions of the plyr
package you have to explicitly state the objects that should be copied.
The different processes do not share their memory. (I am not aware of other R packages that can use shared memory...)
One can demonstrate that the matrix x
is actually copied by using .Internal(inspect(x))
to print object x
's memory location.
library(foreach)
library(doParallel)
x <- matrix(1:16, nrow = 8, ncol = 2)
#print memory location of x
capture.output(.Internal(inspect(x)))[1]
#create parallel backend; in our case two Rscript.exe processes
workers=makeCluster(2)
registerDoParallel(workers)
y<- foreach(i=1:8, .combine='rbind') %dopar% {
#return memory location of x
capture.output(.Internal(inspect(x)))[1]
}
#print matrix y
#there should be two different memory locations -
#according to the two Rscript.exe processes started above
y
#close parallel backend
stopCluster(workers)
The matrix y
reads
[,1]
result.1 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(1),ATT] (len=16, tl=0) 1,2,3,4,5,..."
result.2 "@0x0000000003dab9b0 13 INTSXP g0c5 [NAM(1),ATT] (len=16, tl=0) 1,2,3,4,5,..."
result.3 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(2),ATT] (len=16, tl=0) 1,2,3,4,5,..."
result.4 "@0x0000000003dab910 13 INTSXP g0c5 [NAM(2),ATT] (len=16, tl=0) 1,2,3,4,5,..."
...
You should find two different memory addresses there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With