Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unknown error (worker initialization failed: 21) in foreach() with doParallel cluster (R)

First-time poster here. Before posting, I read FAQs and posting guides as recommended so I hope I am posting my question in the correct format.

I am running foreach() tasks using the doParallel cluster backend in R 64 bit console v. 3.1.2. on Windows 8. Relevant packages are foreach v. 1.4.2 and doParallel v. 1.0.8.

Some sample code to give you an idea of what I am doing:

out <- foreach (j = 1:nsim.times, .combine=rbind, .packages=c("vegan")) %dopar% {

b<-oecosimu(list.mat[[j]], compute.function, "quasiswap", nsimul=nsim.swap) ## where list.mat is a list of matrices and compute.function is a custom function
..... # some intermediate code
return(c(A,B)) ## where A and B are some emergent properties derived from object b from above

}

In one of my tasks, I encountered an error I have never seen before. I tried to search for the error online but couldn't find any clues.

The error was:

Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
worker initialization failed: 21

In the one time I got this error, I ran the code after stopping a previous task (using the Stop button in R Console) but without closing the cluster via 'stopCluster()'.

I ran the same code again after stopping the cluster via 'stopCluster()' and registering a new cluster 'makeCluster()' and 'registerDoParallel()' and the task ran fine.

Has anyone encountered this error or might have any clues/tips as to how I could figure out the issue? Could the error be related to not stopping the previous doParallel cluster?

Any help or advice is much appreciated!

Cheers and thanks!

like image 923
XG_23 Avatar asked Feb 10 '23 23:02

XG_23


1 Answers

I agree that the problem was caused by stopping the master and continuing to use the cluster object which was left in a corrupt state. There was probably unread data in the the socket connections to the cluster workers, causing the master and workers to be out of sync. You may even have trouble calling stopCluster, since that also writes to the socket connections.

If you do stop the master, I would recommend calling stopCluster and then creating a another cluster object, but keep in mind that the previous workers may not always exit properly. It would be best to verify that the worker processes are dead, and manually kill them if they are not.

like image 86
Steve Weston Avatar answered Feb 26 '23 21:02

Steve Weston