Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure automatic restart of the application driver on Yarn

From the Spark Programming Guide

To automatically recover from a driver failure, the deployment infrastructure that is used to run the streaming application must monitor the driver process and relaunch the driver if it fails. Different cluster managers have different tools to achieve this.

Spark Standalon

  • Spark Standalone - A Spark application driver can be submitted to run within the Spark Standalone cluster (see cluster deploy mode), that is, the application driver itself runs on one of the worker nodes. Furthermore, the Standalone cluster manager can be instructed to supervise the driver, and relaunch it if the driver fails either due to non-zero exit code, or due to failure of the node running the driver. See cluster mode and supervise in the Spark Standalone guide for more details.
  • YARN - Yarn supports a similar mechanism for automatically restarting an application. Please refer to YARN documentation for more details. ....

    So, the question is how to support the auto-restart for Spark Streaming on Yarn.

Thanks and best regards,

Tao

like image 784
Tao Li Avatar asked May 15 '15 03:05

Tao Li


People also ask

What are the two ways to run Spark on yarn?

In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

What is Spark yarn executor memoryOverhead?

spark.yarn.executor.memoryOverhead. Is just the max value .The goal is to calculate OVERHEAD as a percentage of real executor memory, as used by RDDs and DataFrames --executor-memory/spark.executor.memory.

What happens when a yarn cluster is started?

In yarn-cluster mode, the driver runs in the Application Master. This means that the same process is responsible for both driving the application and requesting resources from YARN, and this process runs inside a YARN container. The client that starts the app doesn't need to stick around for its entire lifetime.


2 Answers

as documented here: https://spark.apache.org/docs/latest/running-on-yarn.html

spark.yarn.maxAppAttempts -
"The maximum number of attempts that will be made to submit the application. It should be no larger than the global number of max attempts in the YARN configuration."

to set "global number of max attempts in the YARN configuration":

https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

yarn.resourcemanager.am.max-attempts - "The maximum number of application attempts. It's a global setting for all application masters. Each application master can specify its individual maximum number of application attempts via the API, but the individual number cannot be more than the global upper bound. If it is, the resourcemanager will override it. The default number is set to 2, to allow at least one retry for AM"

like image 74
Amir Mamo Avatar answered Oct 26 '22 11:10

Amir Mamo


What you are looking for is the set of instructions to launch your application in yarn "cluster mode" : https://spark.apache.org/docs/latest/running-on-yarn.html

This means that your driver application runs on the cluster on YARN (not on your local machine). As such it can be restarted by YARN if it fails.

like image 42
Francois G Avatar answered Oct 26 '22 11:10

Francois G