yesterday I updated my R packages and since then parallel execution of the train function fails.
It seems like some functions that are called from within the workers are not available. These functions are at least flatTable and probFunction.
I experiencing this issues on my production machine, and was able to reproduce it on a clean Windows 7 x64 VM.
I added a minimal working example below. Dear users of stackoverflow: Any help is appreciated!
# R 3.0.2 x64, RStudio Version 0.98.490, Windows 7 x64
data(iris)
library(caret) # 6.0-21
library(doParallel) # 1.0.6
model <- "rf"
# Fail
?probFunction
?flatTable
fitControl <- trainControl(
method = "repeatedcv"
, number = 5 ## 5-fold CV
, repeats = 1 ## repeated one times
, verboseIter =TRUE
)
#### Sequential Version ####
# Runs
train(Species ~ ., data = iris, method = model, trControl = fitControl)
#### Parallelized version ####
# Fails with
# Error in e$fun(obj, substitute(ex), parent.frame(), e$data) :
# worker initialization failed: Error in eval(expr, envir, enclos): could not find function "flatTable"
cl <- makeCluster(3)
registerDoParallel(cl)
train(Species ~ ., data = iris, method = model, trControl = fitControl)
stopCluster(cl)
# Fails with
# Error in { : task 1 failed - "could not find function "probFunction""
fitControl <- trainControl(
method = "repeatedcv"
, number = 5 ## 5-fold CV
, repeats = 1 ## repeated one times
, verboseIter =TRUE
, classProbs = TRUE
)
cl <- makeCluster(3)
registerDoParallel(cl)
train(Species ~ ., data = iris, method = model, trControl = fitControl)
stopCluster(cl)
#### Again sequential version ####
# Fails with
# Error in summary.connection(connection) : invalid connection
train(Species ~ ., data = iris, method = model, trControl = fitControl)
R Session Info
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] e1071_1.6-1 class_7.3-9 randomForest_4.6-7 doParallel_1.0.6 iterators_1.0.6
[6] foreach_1.4.1 caret_6.0-21 ggplot2_0.9.3.1 lattice_0.20-23
loaded via a namespace (and not attached):
[1] car_2.0-19 codetools_0.2-8 colorspace_1.2-4 compiler_3.0.2 dichromat_2.0-0
[6] digest_0.6.4 grid_3.0.2 gtable_0.1.2 labeling_0.2 MASS_7.3-29
[11] munsell_0.4.2 nnet_7.3-7 plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5
[16] reshape2_1.2.2 scales_0.2.3 stringr_0.6.2 tools_3.0.2
The error that you're getting is caused by a bug in caret 6.0-21 when using doParallel, doSNOW, and doMPI. It's been fixed in version 6.0-22 in R-forge, but hasn't been released to CRAN yet. If you don't want to wait for the new version to be released, you can:
The problem was caused by a change in CRAN policy that forbids the use of the :::
operator, even when referencing non-exported functions from within the same package.
Update
Caret 6.0-22 was released to CRAN on 2014-01-18. This should resolve the reported problem using caret with doSNOW and similar parallel backends.
The first error (could not find function
...) disappears with newer versions, as suggested by @Steve Weston, but the second error (Error in summary.connection(connection) : invalid connection
) persists.
With caret version 6.0.84, I could fix it by adding allowParallel = F
to the trainControl arguments for the last sequential run.
The last part of the code in the question changes to:
#### Again sequential version (new) ####
fitControl_new <- trainControl(
method = "repeatedcv"
, number = 5
, repeats = 1
, verboseIter =TRUE
, classProbs = TRUE
, allowParallel = F ## add this argument to overwrite the default TRUE
)
train(Species ~ ., data = iris, method = model, trControl = fitControl_new)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With