Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent Spark Executors from getting Lost when using YARN client mode?

I have one Spark job which runs fine locally with less data but when I schedule it on YARN to execute I keep on getting the following error and slowly all executors get removed from UI and my job fails

15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on myhost1.com: remote Rpc client disassociated 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on myhost2.com: remote Rpc client disassociated 

I use the following command to schedule Spark job in yarn-client mode

 ./spark-submit --class com.xyz.MySpark --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" --driver-java-options -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client --executor-memory 2G --executor-cores 8 --num-executors 12  /home/myuser/myspark-1.0.jar 

What is the problem here? I am new to Spark.

like image 269
Umesh K Avatar asked Jul 30 '15 15:07

Umesh K


People also ask

What is the difference between running running Spark submit in YARN client mode vs YARN cluster mode?

In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

What are executors when we run Spark on YARN?

When running Spark on YARN, each Spark executor runs as a YARN container. Where MapReduce schedules a container and fires up a JVM for each task, Spark hosts multiple tasks within the same container. This approach enables several orders of magnitude faster task startup time.

What happens if a Spark executor fails?

If an executor runs into memory issues, it will fail the task and restart where the last task left off. If that task fails after 3 retries (4 attempts total by default) then that Stage will fail and cause the Spark job as a whole to fail.

How do you put Spark in YARN mode?

Launching Spark on YARNEnsure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager.


2 Answers

I had a very similar problem. I had many executors being lost no matter how much memory we allocated to them.

The solution if you're using yarn was to set --conf spark.yarn.executor.memoryOverhead=600, alternatively if your cluster uses mesos you can try --conf spark.mesos.executor.memoryOverhead=600 instead.

In spark 2.3.1+ the configuration option is now --conf spark.yarn.executor.memoryOverhead=600

It seems like we were not leaving sufficient memory for YARN itself and containers were being killed because of it. After setting that we've had different out of memory errors, but not the same lost executor problem.

like image 78
whaleberg Avatar answered Oct 04 '22 11:10

whaleberg


You can follow this AWS post to calculate memory overhead (and other spark configs to tune): best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr

like image 23
piritocle Avatar answered Oct 04 '22 12:10

piritocle