I am starting a spark cluster on AWS, with one master and 60 cores:
Here is the command to start up, basically 2 executors per core, totally 120 executors:
spark-submit --deploy-mode cluster --master yarn-cluster --driver-memory 180g --driver-cores 26 --executor-memory 90g --executor-cores 13 --num-executors 120
However, in the job tracker, there is only 119 executors:
I thought there should be 1 driver + 120 worker executors. However, what I saw was 119 executors, which including 1 driver + 118 working executors.
Does that mean my Master node was not used? Is the driver running on the Master node or Core node? Can I make the driver run on the Master node and let the 60 Cores hosting 120 working executors?
Thanks!
The master node contains driver program, which drives the application by creating Spark context object. Spark context object works with cluster manager to manage different jobs. Worker nodes job is to execute the tasks and return the results to Master node.
Spark driver is a program that runs on the master node of the machine which declares transformations and actions on knowledge RDDs. In easy terms, the driver in Spark creates SparkContext, connected to a given Spark Master.It conjointly delivers the RDD graphs to Master, wherever the standalone cluster manager runs.
The Spark Master is the process that requests resources in the cluster and makes them available to the Spark Driver. In all deployment modes, the Master negotiates resources or containers with Worker nodes or slave nodes and tracks their status and monitors their progress.
Master is per cluster, and Driver is per application. For standalone/yarn clusters, Spark currently supports two deploy modes. In client mode, the driver is launched in the same process as the client that submits the application.
In yarn-cluster mode, the driver runs in the Application Master. This means that the same process is responsible for both driving the application and requesting resources from YARN, and this process runs inside a YARN container. The client that starts the app doesn’t need to stick around for its entire lifetime.
In yarn-client mode, Spark driver to run inside the client process that initiates the Spark application.
Have a look at cloudera blog for more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With