I understand how worker nodes are fault tolerant, but what happens if your driver program crashes for some unexpected reason? (power down / memory issue etc).
I would imagine you will lose all work, as the code reading the results is not running anymore, or does Spark somehow know how to restart it? If so how?
As per Spark Documentation:-
Spark Standalone - A Spark application driver can be submitted to run within the Spark Standalone cluster (see cluster deploy mode), that is, the application driver itself runs on one of the worker nodes. Furthermore, the Standalone cluster manager can be instructed to supervise the driver, and relaunch it if the driver fails either due to non-zero exit code, or due to failure of the node running the driver. See cluster mode and supervise in the Spark Standalone guide for more details.
So --supervise will only work with Standalone cluster mode, If your application is submitted in the Yarn Cluster mode then yarn will handle restart of Driver as configured in mapreduce.am.max-attempts property in mapred-site.xml, So your code should be such that it removes the output directory and start from scratch else will fail with error of output directory already exists.
As @zsxwing points out it depends on how you run your driver. In addition to running in yarn, you can also run your job with a deploy mode of cluster (this is a parameter to spark-submit). In Spark Streaming you specify --supervise and Spark will restart the job for you. The details are in the Spark Streaming Guide.
we can use zookeeper and local file system to configure high availability you can check it on offical documentation
http://spark.apache.org/docs/latest/spark-standalone.html#high-availability
Yes you can restart spark applications. There are a few options available that are specific to the cluster manager that is being used. For example, with a Spark standalone cluster with cluster deploy mode, you can also specify --supervise to make sure that the driver is automatically restarted if it fails with non-zero exit code. To enumerate all such options available to spark-submit, run it with --help:
Run on a Spark standalone cluster in cluster deploy mode with supervise
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://207.184.161.138:7077 \
--deploy-mode cluster \
--supervise \
/path/to/examples.jar \
1000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With