Parallel `for` loop with an array as output

How can I run a for loop in parallel (so I can use all the processors on my windows machine) with the result being a 3 dimension array? The code I have now takes about an hour to run and is something like:

guad = array(NA,c(1680,170,15))
for (r in 1:15)
  name = paste("P:/......",r,".csv",sep="")
  pp = read.table(name,sep=",",header=T)
    #lots of stuff to calculate x (which is a matrix)
  guad[,,r]= x  #

I have been looking at related questions and thought I could use foreach but I couldn't find a way to combine the matrices into an array.

I am new to parallel programming so any help will be very much appreciated!

1 Answers

You could do that with foreach using the abind function. Here's an example using the doParallel package as the parallel backend which is fairly portable:

cl <- makePSOCKcluster(3)
acomb <- function(...) abind(..., along=3)
guad <- foreach(r=1:4, .combine='acomb', .multicombine=TRUE) %dopar% {
  x <- matrix(rnorm(16), 4)  # compute x somehow
  x  # return x as the task result

This uses a combine function called acomb that uses the abind function from the abind package to combine the matrices generated by the cluster workers into a 3 dimensional array.

In this case, you can also combine the results using cbind and then modify the dim attribute afterwards to convert the resulting matrix into a 3 dimensional array:

guad <- foreach(r=1:4, .combine='cbind') %dopar% {
  x <- matrix(rnorm(16), 4)  # compute x somehow
  x  # return x as the task result
dim(guad) <- c(4,4,4)

The use of abind is useful since it can combine matrices and arrays in a variety of ways. Also, be aware that resetting the dim attribute may cause the matrix to be duplicated which could be a problem for large arrays.

Note that it's a good idea to shutdown the cluster at the end of the script using stopCluster(cl).

