Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Role of master in Spark standalone cluster

In a Spark standalone cluster, what is exactly the role of the master (a node started with start_master.sh script)?

I understand that is the node that receives the jobs from the submit-job.sh script, but what is its role when processing a job?

I'm seeing in the web UI that always delivers the job to a slave (a node started with start_slave.sh) and is not participating from processing, Am I right? In that case, should I also run also the script start_slave.sh in the same machine than master to to take advantage of its resources (cpu and memory)?

Thanks in advance.

like image 834
italktothewind Avatar asked Oct 30 '22 07:10

italktothewind


1 Answers

Spark runs in the following cluster modes:

  • Local
  • Standalone
  • Mesos
  • Yarn

The above are cluster modes which offer resources to Spark Applications

Spark standalone mode is master slave architecture, we have Spark Master and Spark Workers. Spark Master runs in one of the cluster nodes and Spark Workers run on the Slave nodes of the cluster.

Spark Master (often written standalone Master) is the resource manager for the Spark Standalone cluster to allocate the resources (CPU, Memory, Disk etc...) among the Spark applications. The resources are used to run the Spark Driver and Executors.

Spark Workers report to Spark Master about resources information on the Slave nodes.

[apache-spark]

like image 199
Naga Avatar answered Nov 15 '22 08:11

Naga