Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error occurring in caret when running on a cluster

I am running the train function in caret on a cluster via doRedis. For the most part, it works, but every so often I get errors at the very end of this nature:

error calling combine function:
<simpleError: obj$state$numResults <= obj$state$numValues is not TRUE>

and

Error in names(resamples) <- gsub("^\\.", "", names(resamples)) : 
  attempt to set an attribute on NULL

when I run traceback() I get:

5: nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, 
       ppOpts = preProcess, ctrl = trControl, lev = classLevels, 
       ...)
4: train.default(x, y, weights = w, ...)
3: train(x, y, weights = w, ...)
2: train.formula(couple ~ ., training.balanced, method = "nnet", 
       preProcess = "range", tuneGrid = nnetGrid, MaxNWts = 2200)
1: caret::train(couple ~ ., training.balanced, method = "nnet", 
       preProcess = "range", tuneGrid = nnetGrid, MaxNWts = 2200)

These errors are not easily reproducible (i.e. they happen sometimes, but not consistently) and only occur at the end of the run. The stdout on the cluster shows all tasks running and completed, so I am a bit flummoxed.

Has anyone encountered these errors? and if so understand the cause and even better a fix?

like image 818
Tommy Levi Avatar asked Nov 03 '22 21:11

Tommy Levi


1 Answers

I imagine you've already solved this problem, but I ran into the same issue on my cluster consisting of linux and windows systems. I was running the server on ubuntu 14.04 and had noticed the warnings when starting the server service about having 'transparent huge pages' enabled in the linux kernel. I ignored that message and began running training exercises where most of the machines were maxed out with workers. I received the same error at the end of the run:

error calling combine function:
<simpleError: obj$state$numResults <= obj$state$numValues is not TRUE>

After a lot of head scratching and useless tinkering, I decided to address that warning by following the instructions here: http://ubuntuforums.org/showthread.php?t=2255151

Essentially, I installed hugeadm using:

sudo apt-get install hugeadm

Then disabled the transparent pages using:

hugeadm --thp-never

Note that this change will be undone on restart of the computer.

When I re-ran my training process it ran without any errors.

Hope that helps.

Cheers, Eric

like image 115
Eric C. Avatar answered Nov 15 '22 07:11

Eric C.