Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deploy mode in "SPARK-SUBMIT"

In SPARK-SUBMIT , what is the difference between "yarn" , "yarn-cluster" , "yarn-client" deploy modes ?

./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master yarn-cluster \  # can also be `yarn-client` for client mode
  --executor-memory 20G \
  --num-executors 50 \
  /path/to/examples.jar \
  1000

https://spark.apache.org/docs/1.1.0/submitting-applications.html

like image 284
user3279189 Avatar asked Dec 18 '14 19:12

user3279189


People also ask

How do I run spark submit in client mode?

You can submit a Spark batch application by using cluster mode (default) or client mode either inside the cluster or from an external client: Cluster mode (default): Submitting Spark batch application and having the driver run on a host in your driver resource group. The spark-submit syntax is --deploy-mode cluster.

What are different modes of deploying spark cluster?

Spark application can be submitted in two different ways – cluster mode and client mode. In cluster mode, the driver will get started within the cluster in any of the worker machines. So, the client can fire the job and forget it. In client mode, the driver will get started within the client.

How do I run spark submit in debug mode?

In order to start the application, select the Run -> Debug SparkLocalDebug, this tries to start the application by attaching to 5005 port. Now you should see your spark-submit application running and when it encounter debug breakpoint, you will get the control to IntelliJ.


1 Answers

For Spark on YARN, you can specify either yarn-client or yarn-cluster. Yarn-client runs driver program in the same JVM as spark submit, while yarn-cluster runs Spark driver in one of NodeManager's container.

From the documentation: https://spark.apache.org/docs/1.1.0/running-on-yarn.html There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.

like image 146
suztomo Avatar answered Sep 20 '22 13:09

suztomo