I understood how MRv1 works.Now I am trying to understand MRv2.. what's the difference between Application Manager and Application Master in YARN?
The ApplicationMaster is, in effect, an instance of a framework-specific library and is responsible for negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the containers and their resource consumption.
The Application Master knows the application logic and thus it is framework-specific. The MapReduce framework provides its own implementation of an Application Master. The Resource Manager is a single point of failure in YARN.
The Application Master is the process that coordinates the execution of an application in the cluster. Each application has its own unique Application Master that is tasked with negotiating resources (Containers) from the Resource Manager and working with the Node Managers to execute and monitor the tasks.
Application workflow in Hadoop YARN: The Application Manager negotiates containers from the Resource Manager. The Application Manager notifies the Node Manager to launch containers. Application code is executed in the container. Client contacts Resource Manager/Application Manager to monitor application's status.
The terms Application Master and Application Manager are often used interchangeably. In reality Application Master is the main container requesting, launching and monitoring application specific resources, whereas Application Manager is a component inside ResourceManager. More details about Application Manager is given below.
The ApplicationsManager is responsible for maintaining a collection of submitted applications. After application submission, it first validates the application’s specifications and rejects any application that requests unsatisfiable resources for its ApplicationMaster (i.e., there is no node in the cluster that has enough resources to run the ApplicationMaster itself). It then ensures that no other application was already submitted with the same application ID—a scenario that can be caused by an erroneous or a malicious client. Finally, it forwards the admitted application to the scheduler. This component is also responsible for recording and managing finished applications for a while before they are completely evacuated from the ResourceManager’s memory. When an application finishes, it places an ApplicationSummary in the daemon’s log file. Finally, the ApplicationsManager keeps a cache of completed applications long after applications finish to support users’ requests for application data (via web UI or command line). The configuration property yarn.resourcemanager.max-completed-applications controls the maximum number of such finished applications that the ResourceManager remembers at any point of time. The cache is a first-in, first-out list, with the oldest applications being moved out to accommodate freshly finished applications.
Reference: Hadoop YARN Book
Here Application refers to a single job assigned to the framework.
The Application manager is responsible to accept or reject the application when it is submitted to the Resource manager by the client.
The Application master is responsible for the execution of a single application when it is assigned to the Node manager by the Resource manager.
Does this make sense?
To understand this concept we need to understand the complete flow of Job/Application submitted via YARN in Hadoop.
Before we jump to execution flow we need to understand some key concepts:
KEY CONCEPTS:
Now, lets discuss about Job/Application Flow via YARN
I hope this makes some clarity
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With