Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark task size too big

I'm using LBFGS logistic regression to classify examples into one of the two categories. When, I'm training the model, I get many warnings of this kind -

WARN scheduler.TaskSetManager: Stage 132 contains a task of very large size (109 KB). The maximum recommended task size is 100 KB.
WARN scheduler.TaskSetManager: Stage 134 contains a task of very large size (102 KB). The maximum recommended task size is 100 KB.
WARN scheduler.TaskSetManager: Stage 136 contains a task of very large size (109 KB). The maximum recommended task size is 100 KB.

I have about 94 features and about 7500 training examples. Is there some other argument I should pass in order to break up the task size into smaller chunks?

Also, is this just a warning that, in the worst case can be ignored? Or does it hamper the training?

I'm calling my trainer this way --

val lr_lbfgs = new LogisticRegressionWithLBFGS().setNumClasses(2)
lr_lbfgs.optimizer.setRegParam(reg).setNumIterations(numIterations)
val model = lr_lbfgs.run(trainingData)

Also, my driver and executor memory is 20G which I set as arguments to spark-submit

like image 801
shashydhar Avatar asked Oct 17 '22 21:10

shashydhar


1 Answers

Spark sends a copy of every variable and method that needs to be visible to the executors; this warning means that, in total, these objects exceed 100 KB. You can safely ignore this warning if it doesn't impact performance noticeably, or you could consider marking some variables as broadcast variables.

like image 50
user4601931 Avatar answered Oct 21 '22 08:10

user4601931