Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the relationship between workers, worker instances, and executors?

In Spark Standalone mode, there are master and worker nodes.

Here are few questions:

  1. Does 2 worker instance mean one worker node with 2 worker processes?
  2. Does every worker instance hold an executor for specific application (which manages storage, task) or one worker node holds one executor?
  3. Is there a flow chart explaining how spark works on runtime, such as word count?
like image 508
edwardsbean Avatar asked Jul 11 '14 11:07

edwardsbean


People also ask

Is worker and executor same?

The memory components of a Spark cluster worker node are Memory for HDFS, YARN and other daemons, and executors for Spark applications. Each cluster worker node contains executors. An executor is a process that is launched for a Spark application on a worker node.

How many executors can a worker node have in Spark?

In Spark Standalone mode, there are master node and worker nodes. If we represent both master and workers(each worker can have multiple executors if CPU and memory are available) at one place for standalone mode.

How many executors can an employee have?

In a standalone cluster you will get one executor per worker unless you play with `spark. executor. cores` and a worker has enough cores to hold more than one executor. When i start an application with default settings, Spark will greedily acquire as many cores and executors as are offered by the scheduler.

What are Spark executors?

Executors in Spark are the worker nodes that help in running individual tasks by being in charge of a given spark job. These are launched at the beginning of Spark applications, and as soon as the task is run, results are immediately sent to the driver.


1 Answers

I suggest reading the Spark cluster docs first, but even more so this Cloudera blog post explaining these modes.

Your first question depends on what you mean by 'instances'. A node is a machine, and there's not a good reason to run more than one worker per machine. So two worker nodes typically means two machines, each a Spark worker.

Workers hold many executors, for many applications. One application has executors on many workers.

Your third question is not clear.

like image 144
Sean Owen Avatar answered Oct 13 '22 05:10

Sean Owen