How can I run a for
loop in parallel (so I can use all the processors on my windows machine) with the result being a 3 dimension array? The code I have now takes about an hour to run and is something like:
guad = array(NA,c(1680,170,15))
for (r in 1:15)
{
name = paste("P:/......",r,".csv",sep="")
pp = read.table(name,sep=",",header=T)
#lots of stuff to calculate x (which is a matrix)
guad[,,r]= x #
}
I have been looking at related questions and thought I could use foreach
but I couldn't find a way to combine the matrices into an array.
I am new to parallel programming so any help will be very much appreciated!
You could do that with foreach
using the abind
function. Here's an example using the doParallel
package as the parallel backend which is fairly portable:
library(doParallel)
library(abind)
cl <- makePSOCKcluster(3)
registerDoParallel(cl)
acomb <- function(...) abind(..., along=3)
guad <- foreach(r=1:4, .combine='acomb', .multicombine=TRUE) %dopar% {
x <- matrix(rnorm(16), 4) # compute x somehow
x # return x as the task result
}
This uses a combine function called acomb
that uses the abind
function from the abind
package to combine the matrices generated by the cluster workers into a 3 dimensional array.
In this case, you can also combine the results using cbind
and then modify the dim
attribute afterwards to convert the resulting matrix into a 3 dimensional array:
guad <- foreach(r=1:4, .combine='cbind') %dopar% {
x <- matrix(rnorm(16), 4) # compute x somehow
x # return x as the task result
}
dim(guad) <- c(4,4,4)
The use of abind
is useful since it can combine matrices and arrays in a variety of ways. Also, be aware that resetting the dim
attribute may cause the matrix to be duplicated which could be a problem for large arrays.
Note that it's a good idea to shutdown the cluster at the end of the script using stopCluster(cl)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With