Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between Driver and Application manager in spark

I couldn't figure out what is the difference between Spark driver and application master. Basically the responsibilities in running an application, who does what?

In client mode, client machine has the driver and app master runs in one of the cluster nodes. In cluster mode, client doesn't have any, driver and app master runs in same node (one of the cluster nodes).

What exactly are the operations that driver do and app master do?

References:

  • Spark Driver memory and Application Master memory
  • Spark yarn cluster vs client - how to choose which one to use?
like image 424
newbie Avatar asked Sep 16 '20 06:09

newbie


People also ask

What is application manager in Spark?

ApplicationMaster is a standalone application that YARN NodeManager runs inside a YARN resource container and is responsible for the execution of a Spark application on YARN. When created ApplicationMaster class is given a YarnRMClient (which is responsible for registering and unregistering a Spark application).

Is application Master and driver same?

The Driver informs the Application Master of the executor's needs for the application, and the Application Master negotiates the resources with the Resource Manager to host these executors.

What is a driver in Spark?

The driver process runs your main() function, sits on a node in the cluster, and is responsible for three things: maintaining information about the Spark Application; responding to a user's program or input; and analyzing, distributing, and scheduling work across the executors (defined momentarily).

What is the role of driver in Spark architecture?

The driver is the process that runs the user code that creates RDDs, and performs transformation and action, and also creates SparkContext. When the Spark Shell is launched, this signifies that we have created a driver program. On the termination of the driver, the application is finished.


1 Answers

As per the spark documentation

Spark Driver :

The Driver(aka driver program) is responsible for converting a user application to smaller execution units called tasks and then schedules them to run with a cluster manager on executors. The driver is also responsible for executing the Spark application and returning the status/results to the user.

Spark Driver contains various components – DAGScheduler, TaskScheduler, BackendScheduler and BlockManager. They are responsible for the translation of user code into actual Spark jobs executed on the cluster.

Where in Application Master is

The Application Master is responsible for the execution of a single application. It asks for containers from the Resource Scheduler (Resource Manager) and executes specific programs on the obtained containers. Application Master is just a broker that negotiates resources with the Resource Manager and then after getting some container it make sure to launch tasks(which are picked from scheduler queue) on containers.

In a nutshell Driver program will translate your custom logic into stages, job and task.. and your application master will make sure to get enough resources from RM And also make sure to check the status of your tasks running in a container.

as it is already said in your provided references the only different between client and cluster mode is

In client, mode driver will run on the machine where we have executed/run spark application/job and AM runs in one of the cluster nodes.

(AND)

In cluster mode driver run inside application master, it means the application has much more responsibility.

References :

https://luminousmen.com/post/spark-anatomy-of-spark-application#:~:text=The%20Driver(aka%20driver%20program,status%2Fresults%20to%20the%20user.

https://www.edureka.co/community/1043/difference-between-application-master-application-manager#:~:text=The%20Application%20Master%20is%20responsible,class)%20on%20the%20obtained%20containers.

like image 184
kavetiraviteja Avatar answered Oct 20 '22 21:10

kavetiraviteja