Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark executor lost failure

I am using the databricks spark cluster (AWS), and testing on my scala experiment. I have some issue when training on a 10 GB data with LogisticRegressionWithLBFGS algorithm. The code block where I met the issue is as follows:

import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
val algorithm = new LogisticRegressionWithLBFGS()
algorithm.run(training_set)

First I got a lot executor lost failure and java out of memory issues, then I repartitioned my training_set with more partitions and the out of memory issues are gone, but Still get executor lost failure.

My cluster has 72 cores and 500GB ram in total. Can any one give some idea on this?

like image 209
peng Avatar asked Apr 10 '15 16:04

peng


People also ask

What happens when executor fails in Spark?

If an executor runs into memory issues, it will fail the task and restart where the last task left off. If that task fails after 3 retries (4 attempts total by default) then that Stage will fail and cause the Spark job as a whole to fail.

How do you prevent Spark executors from getting lost when using YARN client mode?

Here the best solution to this problem is to use yarn and set –conf spark. yarn. executor. memoryOverhead=600, alternatively when cluster using mesos, try this –conf spark.

Why are executors removed in Spark?

A Spark application removes an executor when it has been idle for more than spark. dynamicAllocation.


1 Answers

LBFGS uses dense vector for storing betas (feature weights) internally and everything is in memory. So regardless of sparsity of features in training set, the total count of features is something to be mindful about.

So to solve this user should either increase executor memory or limit total count of features in the training set.

like image 116
Barak1731475 Avatar answered Oct 11 '22 12:10

Barak1731475