From the Spark Programming Guide To automatically recover from a driver failure, the deployment infrastructure that is used to run the streaming application must monitor the driver process and relaunch the driver if it fails. Different cluster managers have different tools to achieve this. Spark Standalon <ul> <li> Spark Standalone - A Spark application driver can be submitted to run within the Spark Standalone cluster (see cluster deploy mode), that is, the application driver itself runs on one of the worker nodes. Furthermore, the Standalone cluster manager can be instructed to supervise the driver, and relaunch it if the driver fails either due to non-zero exit code, or due to failure of the node running the driver. See cluster mode and supervise in the Spark Standalone guide for more details.</li> <li> YARN - Yarn supports a similar mechanism for automatically restarting an application. Please refer to YARN documentation for more details. .... <blockquote> So, the question is how to support the auto-restart for Spark Streaming on Yarn. </blockquote> </li> </ul> Thanks and best regards, Tao

as documented here: https://spark.apache.org/docs/latest/running-on-yarn.html spark.yarn.maxAppAttempts - "The maximum number of attempts that will be made to submit the application. It should be no larger than the global number of max attempts in the YARN configuration." to set "global number of max attempts in the YARN configuration": https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml yarn.resourcemanager.am.max-attempts - "The maximum number of application attempts. It's a global setting for all application masters. Each application master can specify its individual maximum number of application attempts via the API, but the individual number cannot be more than the global upper bound. If it is, the resourcemanager will override it. The default number is set to 2, to allow at least one retry for AM"

What you are looking for is the set of instructions to launch your application in yarn "cluster mode" : https://spark.apache.org/docs/latest/running-on-yarn.html This means that your driver application runs on the cluster on YARN (not on your local machine). As such it can be restarted by YARN if it fails.

How to configure automatic restart of the application driver on Yarn

Tags:

apache-spark

hadoop-yarn

spark-streaming

From the Spark Programming Guide

To automatically recover from a driver failure, the deployment infrastructure that is used to run the streaming application must monitor the driver process and relaunch the driver if it fails. Different cluster managers have different tools to achieve this.

Spark Standalon

Spark Standalone - A Spark application driver can be submitted to run within the Spark Standalone cluster (see cluster deploy mode), that is, the application driver itself runs on one of the worker nodes. Furthermore, the Standalone cluster manager can be instructed to supervise the driver, and relaunch it if the driver fails either due to non-zero exit code, or due to failure of the node running the driver. See cluster mode and supervise in the Spark Standalone guide for more details.
YARN - Yarn supports a similar mechanism for automatically restarting an application. Please refer to YARN documentation for more details. ....

So, the question is how to support the auto-restart for Spark Streaming on Yarn.

Thanks and best regards,

Tao

784

asked May 15 '15 03:05

Tao Li

2 Answers

as documented here: https://spark.apache.org/docs/latest/running-on-yarn.html

spark.yarn.maxAppAttempts -
"The maximum number of attempts that will be made to submit the application. It should be no larger than the global number of max attempts in the YARN configuration."

to set "global number of max attempts in the YARN configuration":

https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

yarn.resourcemanager.am.max-attempts - "The maximum number of application attempts. It's a global setting for all application masters. Each application master can specify its individual maximum number of application attempts via the API, but the individual number cannot be more than the global upper bound. If it is, the resourcemanager will override it. The default number is set to 2, to allow at least one retry for AM"

answered Oct 26 '22 11:10

Amir Mamo

What you are looking for is the set of instructions to launch your application in yarn "cluster mode" : https://spark.apache.org/docs/latest/running-on-yarn.html

This means that your driver application runs on the cluster on YARN (not on your local machine). As such it can be restarted by YARN if it fails.

answered Oct 26 '22 11:10

Francois G

Related questions
                            
                                spark worker with 32GB or more memory encountered a fatal error
                            
                                Why Mongo Spark connector returns different and incorrect counts for a query?
                            
                                Spark Error : executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
                            
                                How does Pyspark Calculate Doc2Vec from word2vec word embeddings?
                            
                                When to execute REFRESH TABLE my_table in spark?
                            
                                Apache airflow - automation - how to run spark submit job with param
                            
                                PySpark.sql.filter not performing as it should
                            
                                ModuleNotFoundError in PySpark Worker on rdd.collect()
                            
                                RuntimeError: Unsupported type in conversion to Arrow: VectorUDT
                            
                                How to print the decision path / rules used to predict sample of a specific row in PySpark?
                            
                                Table loaded through Spark not accessible in Hive
                            
                                pyspark: Method isBarrier([]) does not exist
                            
                                PySpark error: AnalysisException: 'Cannot resolve column name
                            
                                What problems can arise from a Spark non-deterministic Pandas UDF
                            
                                attributeerror: 'AioClientCreator' object has no attribute '_register_lazy_block_unknown_fips_pseudo_regions'
                            
                                How to bundle many files in S3 using Spark
                            
                                Spark groupBy OutOfMemory woes
                            
                                How to set the number of partitions for newAPIHadoopFile?
                            
                                How to make Spark Streaming (Spark 1.0.0) read the latest data from Kafka (Kafka Broker 0.8.1)
                            
                                Cannot deploy local Spark job, worker fails with EndPointAssociationError

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With