I'm using LBFGS logistic regression to classify examples into one of the two categories. When, I'm training the model, I get many warnings of this kind -
WARN scheduler.TaskSetManager: Stage 132 contains a task of very large size (109 KB). The maximum recommended task size is 100 KB.
WARN scheduler.TaskSetManager: Stage 134 contains a task of very large size (102 KB). The maximum recommended task size is 100 KB.
WARN scheduler.TaskSetManager: Stage 136 contains a task of very large size (109 KB). The maximum recommended task size is 100 KB.
I have about 94 features and about 7500 training examples. Is there some other argument I should pass in order to break up the task size into smaller chunks?
Also, is this just a warning that, in the worst case can be ignored? Or does it hamper the training?
I'm calling my trainer this way --
val lr_lbfgs = new LogisticRegressionWithLBFGS().setNumClasses(2)
lr_lbfgs.optimizer.setRegParam(reg).setNumIterations(numIterations)
val model = lr_lbfgs.run(trainingData)
Also, my driver and executor memory is 20G
which I set as arguments to spark-submit
Spark sends a copy of every variable and method that needs to be visible to the executors; this warning means that, in total, these objects exceed 100 KB. You can safely ignore this warning if it doesn't impact performance noticeably, or you could consider marking some variables as broadcast variables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With