I am running the train
function in caret on a cluster via doRedis. For the most part, it works, but every so often I get errors at the very end of this nature:
error calling combine function:
<simpleError: obj$state$numResults <= obj$state$numValues is not TRUE>
and
Error in names(resamples) <- gsub("^\\.", "", names(resamples)) :
attempt to set an attribute on NULL
when I run traceback()
I get:
5: nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,
ppOpts = preProcess, ctrl = trControl, lev = classLevels,
...)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(couple ~ ., training.balanced, method = "nnet",
preProcess = "range", tuneGrid = nnetGrid, MaxNWts = 2200)
1: caret::train(couple ~ ., training.balanced, method = "nnet",
preProcess = "range", tuneGrid = nnetGrid, MaxNWts = 2200)
These errors are not easily reproducible (i.e. they happen sometimes, but not consistently) and only occur at the end of the run. The stdout on the cluster shows all tasks running and completed, so I am a bit flummoxed.
Has anyone encountered these errors? and if so understand the cause and even better a fix?
I imagine you've already solved this problem, but I ran into the same issue on my cluster consisting of linux and windows systems. I was running the server on ubuntu 14.04 and had noticed the warnings when starting the server service about having 'transparent huge pages' enabled in the linux kernel. I ignored that message and began running training exercises where most of the machines were maxed out with workers. I received the same error at the end of the run:
error calling combine function:
<simpleError: obj$state$numResults <= obj$state$numValues is not TRUE>
After a lot of head scratching and useless tinkering, I decided to address that warning by following the instructions here: http://ubuntuforums.org/showthread.php?t=2255151
Essentially, I installed hugeadm using:
sudo apt-get install hugeadm
Then disabled the transparent pages using:
hugeadm --thp-never
Note that this change will be undone on restart of the computer.
When I re-ran my training process it ran without any errors.
Hope that helps.
Cheers, Eric
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With