I'm trying to run a gbm model in H2O via R and get one of the following errors:
|========================== | 25%
Polling fails:
<simpleError in .h2o.__poll(client, job_key): Got exception 'class java.lang.RuntimeException', with msg 'java.lang.AssertionError: NewChunk.dst.len = 0, oc._len = 1235'
java.lang.RuntimeException: java.lang.AssertionError: NewChunk.dst.len = 0, oc._len = 1235
at hex.FrameExtractor.getResult(FrameExtractor.java:77)
at water.util.CrossValUtils.crossValidate(CrossValUtils.java:29)
at hex.gbm.GBM.execImpl(GBM.java:201)
at water.Func.exec(Func.java:42)
at water.Job$3.compute2(Job.java:333)
at water.H2O$H2OCountedCompleter.compute(H2O.java:647)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Caused by: java.lang.AssertionError: NewChunk.dst.len = 0, oc._len = 1235
at water.fvec.ChunkSplitter.extractChunkPart(ChunkSplitter.java:44)
at hex.NFoldFrameExtractor$FoldExtractTask.map(NFoldFrameExtractor.java:105)
at water.MRTask2.compute2(MRTask2.java:404)
... 6 more
>
|=========================================================================================================| 100%
Error in .h2o.__remoteSend(data@h2o, model_view, `_modelKey` = xvalKey[i]) :
http://127.0.0.1:54321/2/GBMModelView.json returned the following error:
Model 'GBM_a1b17d68e29d7ba49cb6243293344b69_xval0' not found!
Or this version:
|=================== | 25%
Polling fails:
<simpleError in .h2o.__poll(client, job_key): Got exception 'class java.lang.AssertionError', with msg 'null'
java.lang.AssertionError
at hex.gbm.GBM.buildNextKTrees(GBM.java:505)
at hex.gbm.GBM.buildModel(GBM.java:296)
at hex.gbm.GBM.buildModel(GBM.java:26)
at hex.gbm.SharedTreeModelBuilder.buildModel(SharedTreeModelBuilder.java:276)
at hex.gbm.GBM.execImpl(GBM.java:200)
at water.Func.exec(Func.java:42)
at water.Job.invoke(Job.java:353)
at water.Job$ValidatedJob.genericCrossValidation(Job.java:889)
at hex.gbm.GBM.crossValidate(GBM.java:709)
at water.util.CrossValUtils.crossValidate(CrossValUtils.java:32)
at hex.gbm.GBM.execImpl(GBM.java:201)
at water.Func.exec(Func.java:42)
at water.Job$3.compute2(Job.java:333)
at water.H2O$H2OCountedCompleter.compute(H2O.java:647)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
>
|============================================================================| 100%
Error in tail(res$cms, 1)[[1]] : subscript out of bounds
Here is the line that causes the error:
dat1.gbm <- h2o.gbm(y = 'click_target2', x = xVars, data = train1.hex
, nfolds = 3
, importance = T
, distribution = 'bernoulli'
, n.trees = 200
, interaction.depth = 10,
# , n.minobsinnode = 2
, shrinkage = 0.01
)
Any suggestions for what's causing this error?
EDIT:
I've been trying to diagnose if there's a problem with the csv file itself and it appears that may be the issue. I ended up writing a script in python to break the large file into individual csv files by week_number. About 2/3's of the way thru reading the file I get a NULL byte exception error. I'm still working to find a fix for this.
I see the failure occurs in the crossValidate() method. The cross-validation implementation in the latest version of H2O (H2O-3) has been rewritten.
Try the latest stable version from here: http://h2o.ai/download
But I don't see how to access the data anywhere in the original post, so I can't verify the issue is truly fixed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With