From the Spark Programming Guide
To automatically recover from a driver failure, the deployment infrastructure that is used to run the streaming application must monitor the driver process and relaunch the driver if it fails. Different cluster managers have different tools to achieve this.
Spark Standalon
So, the question is how to support the auto-restart for Spark Streaming on Yarn.
Tao
In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
spark.yarn.executor.memoryOverhead. Is just the max value .The goal is to calculate OVERHEAD as a percentage of real executor memory, as used by RDDs and DataFrames --executor-memory/spark.executor.memory.
In yarn-cluster mode, the driver runs in the Application Master. This means that the same process is responsible for both driving the application and requesting resources from YARN, and this process runs inside a YARN container. The client that starts the app doesn't need to stick around for its entire lifetime.
as documented here: https://spark.apache.org/docs/latest/running-on-yarn.html
spark.yarn.maxAppAttempts -
"The maximum number of attempts that will be made to submit the application. It should be no larger than the global number of max attempts in the YARN configuration."
to set "global number of max attempts in the YARN configuration":
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
yarn.resourcemanager.am.max-attempts - "The maximum number of application attempts. It's a global setting for all application masters. Each application master can specify its individual maximum number of application attempts via the API, but the individual number cannot be more than the global upper bound. If it is, the resourcemanager will override it. The default number is set to 2, to allow at least one retry for AM"
What you are looking for is the set of instructions to launch your application in yarn "cluster mode" : https://spark.apache.org/docs/latest/running-on-yarn.html
This means that your driver application runs on the cluster on YARN (not on your local machine). As such it can be restarted by YARN if it fails.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With