Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens if the driver program crashes?

Tags:

apache-spark

I understand how worker nodes are fault tolerant, but what happens if your driver program crashes for some unexpected reason? (power down / memory issue etc).

I would imagine you will lose all work, as the code reading the results is not running anymore, or does Spark somehow know how to restart it? If so how?

like image 318
Eran Medan Avatar asked Oct 28 '14 21:10

Eran Medan


4 Answers

As per Spark Documentation:-

Spark Standalone - A Spark application driver can be submitted to run within the Spark Standalone cluster (see cluster deploy mode), that is, the application driver itself runs on one of the worker nodes. Furthermore, the Standalone cluster manager can be instructed to supervise the driver, and relaunch it if the driver fails either due to non-zero exit code, or due to failure of the node running the driver. See cluster mode and supervise in the Spark Standalone guide for more details.

So --supervise will only work with Standalone cluster mode, If your application is submitted in the Yarn Cluster mode then yarn will handle restart of Driver as configured in mapreduce.am.max-attempts property in mapred-site.xml, So your code should be such that it removes the output directory and start from scratch else will fail with error of output directory already exists.

like image 128
Dhimant Jayswal Avatar answered Oct 05 '22 23:10

Dhimant Jayswal


As @zsxwing points out it depends on how you run your driver. In addition to running in yarn, you can also run your job with a deploy mode of cluster (this is a parameter to spark-submit). In Spark Streaming you specify --supervise and Spark will restart the job for you. The details are in the Spark Streaming Guide.

like image 32
Holden Avatar answered Oct 05 '22 23:10

Holden


we can use zookeeper and local file system to configure high availability you can check it on offical documentation

http://spark.apache.org/docs/latest/spark-standalone.html#high-availability

like image 25
Sandeep Purohit Avatar answered Oct 06 '22 00:10

Sandeep Purohit


Yes you can restart spark applications. There are a few options available that are specific to the cluster manager that is being used. For example, with a Spark standalone cluster with cluster deploy mode, you can also specify --supervise to make sure that the driver is automatically restarted if it fails with non-zero exit code. To enumerate all such options available to spark-submit, run it with --help:

Run on a Spark standalone cluster in cluster deploy mode with supervise

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark://207.184.161.138:7077 \
  --deploy-mode cluster \
  --supervise \
  /path/to/examples.jar \
  1000
like image 38
Nilesh Avatar answered Oct 05 '22 22:10

Nilesh