Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: driver/worker configuration. Does driver run on Master node?

I am starting a spark cluster on AWS, with one master and 60 cores:

enter image description here

Here is the command to start up, basically 2 executors per core, totally 120 executors:

spark-submit --deploy-mode cluster --master yarn-cluster --driver-memory 180g --driver-cores 26 --executor-memory 90g --executor-cores 13 --num-executors 120

However, in the job tracker, there is only 119 executors:

enter image description here

I thought there should be 1 driver + 120 worker executors. However, what I saw was 119 executors, which including 1 driver + 118 working executors.

Does that mean my Master node was not used? Is the driver running on the Master node or Core node? Can I make the driver run on the Master node and let the 60 Cores hosting 120 working executors?

Thanks!

like image 227
Edamame Avatar asked Jan 21 '16 17:01

Edamame


People also ask

Does spark driver run on master node?

The master node contains driver program, which drives the application by creating Spark context object. Spark context object works with cluster manager to manage different jobs. Worker nodes job is to execute the tasks and return the results to Master node.

Where does the driver program run in Spark?

Spark driver is a program that runs on the master node of the machine which declares transformations and actions on knowledge RDDs. In easy terms, the driver in Spark creates SparkContext, connected to a given Spark Master.It conjointly delivers the RDD graphs to Master, wherever the standalone cluster manager runs.

What is the role of master node in Spark?

The Spark Master is the process that requests resources in the cluster and makes them available to the Spark Driver. In all deployment modes, the Master negotiates resources or containers with Worker nodes or slave nodes and tracks their status and monitors their progress.

What is difference between spark driver and master?

Master is per cluster, and Driver is per application. For standalone/yarn clusters, Spark currently supports two deploy modes. In client mode, the driver is launched in the same process as the client that submits the application.


1 Answers

In yarn-cluster mode, the driver runs in the Application Master. This means that the same process is responsible for both driving the application and requesting resources from YARN, and this process runs inside a YARN container. The client that starts the app doesn’t need to stick around for its entire lifetime.

enter image description here

In yarn-client mode, Spark driver to run inside the client process that initiates the Spark application.

enter image description here

Have a look at cloudera blog for more details.

like image 106
Ravindra babu Avatar answered Oct 25 '22 05:10

Ravindra babu