Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R doParallel foreach with error handling for independant workers

I have to run a lot of random forest models so I want to use doParallel on my server with 8 cores to speed up the process.

Yet, some models need a lot longer than others or even might throw errors. I would like to run 8 models in parallel, and if a model throws an error and/or is skipped then the workers should just continue. Each model result is saved on harddrive so I can access and combine them later.

TryCatch

or

.errorhandling="remove" 

did not solve the problem. I get

 Error in unserialize(socklist[[n]]) : error reading from connection

Code example: I tried it with %do% and model 2-7 run successfully. Yet in %dopar% I get the shown error

 foreach(model=1:8, .errorhandling="remove") %dopar% {


      tryCatch({
          outl <- rf_perform(...)
          saveRDS(outl,file=getwd() %+% "/temp/result_" %+% model %+% ".rds")

     }, error = function(e) {print(e)}, finally = {})
  }
like image 857
user670186 Avatar asked Oct 18 '22 19:10

user670186


1 Answers

I think I found the problem: if the objects you export to the clusters are too big, either R cant handle it anymore and/or there is a timeout

My data object exportet was 5 million rows and 300 variables, that exported to 16 clusters.

cl <- makeCluster(16)
registerDoParallel(cl)
clusterExport(cl, "data")

#data must not be too large

I downsized the object into smaller pieces and now it works. The authors might want to mention that in the documentation of doParallel or throw a warning if objects are too big.

like image 110
user670186 Avatar answered Oct 21 '22 17:10

user670186