I have been using the library 'doParallel' in R to increase the speed of a set of functions. However, I have run into an error that I cannot solve. I believe the following code isolates the pith of the problem:
library(Matrix)
library(doParallel)
test_mat = Matrix(c(0,1,2,NA,0,0,2,NA,1,NA,1,2,2,NA,0,1,0,2,2,2,0,0,NA,NA,1,2,1,1,2,1,rep(NA,5)), ncol=7, byrow=TRUE, sparse=TRUE)
par_func <- function(mat, ncores)
{
cl <- makePSOCKcluster(ncores)
clusterSetRNGStream(cl)
registerDoParallel(cl, cores = ncores)
df = data.frame(1:7, NA)
temp_vec = foreach(i=iter(df, by='row'), .combine=rbind) %dopar%
{
i[,2] <- sum(mat[,i[,1]] == 1, na.rm = TRUE) + 1
}
stopCluster(cl)
return(temp_vec)
}
par_func(mat=test_mat, ncores=5)
Which produces the following error message:
Error in { : task 1 failed - "object of type 'S4' is not subsettable"
This function works if 'mat' is of class 'matrix' rather than 'dgCMatrix' so the problem appears to be due to the subsetting of a sparse Matrix. Do I have any options to work around this problem? The matrix 'mat' can be very large and can comprise many zeros, so I would like to continue to work with sparse matrices.
The fundamental problem is that the workers haven't loaded the Matrix package, so they don't know how to subset the Matrix object "mat". You can fix that with the foreach .packages
option:
temp_vec = foreach(i=iter(df, by='row'), .packages='Matrix', .combine=rbind) %dopar% {
# snip
}
Note that your example fails on all platforms, but if you were to register doParallel with:
registerDoParallel(4)
then your foreach loop would have worked on Linux and Mac OS X, but failed on Windows! The reason is that on Linux and Mac OS X, the mclapply function would be used, but on Windows, a cluster object would be implicitly created, and then the clusterApplyLB function would be used. The workers are forked by mclapply, so they inherit the parent's environment, including the loaded packages, and thus the foreach loop works. But the environment isn't inherited by the workers when using makePSOCKcluster, so you have to initialize the environment of the workers using things like the .packages
option, otherwise the foreach loop fails. It's ironic that since the doParallel package hides this difference in order to make things easier, it sets up a little portability trap for Windows users.
There are other ways in which this example can be improved (as mentioned by @agstudy), but as I said, the fundamental problem is that the Matrix package isn't loaded on the workers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With