I have to run a lot of random forest models so I want to use doParallel on my server with 8 cores to speed up the process.
Yet, some models need a lot longer than others or even might throw errors. I would like to run 8 models in parallel, and if a model throws an error and/or is skipped then the workers should just continue. Each model result is saved on harddrive so I can access and combine them later.
TryCatch
or
.errorhandling="remove"
did not solve the problem. I get
Error in unserialize(socklist[[n]]) : error reading from connection
Code example: I tried it with %do% and model 2-7 run successfully. Yet in %dopar% I get the shown error
foreach(model=1:8, .errorhandling="remove") %dopar% {
tryCatch({
outl <- rf_perform(...)
saveRDS(outl,file=getwd() %+% "/temp/result_" %+% model %+% ".rds")
}, error = function(e) {print(e)}, finally = {})
}
I think I found the problem: if the objects you export to the clusters are too big, either R cant handle it anymore and/or there is a timeout
My data object exportet was 5 million rows and 300 variables, that exported to 16 clusters.
cl <- makeCluster(16)
registerDoParallel(cl)
clusterExport(cl, "data")
#data must not be too large
I downsized the object into smaller pieces and now it works. The authors might want to mention that in the documentation of doParallel or throw a warning if objects are too big.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With