Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differences between existing MapReduce and YARN (MRv2)

Would anyone tell me, which are the differences between existing MapReduce and YARN, because I do not find all clearly differences between these two?

P.S: I'm asking for something like a comparison between these.

Thanks!

like image 934
Yon Avatar asked Aug 27 '13 10:08

Yon


2 Answers

MRv1 uses the JobTracker to create and assign tasks to data nodes, which can become a resource bottleneck when the cluster scales out far enough (usually around 4,000 nodes).

MRv2 (aka YARN, "Yet Another Resource Negotiator") has a Resource Manager for each cluster, and each data node runs a Node Manager. For each job, one slave node will act as the Application Master, monitoring resources/tasks, etc.

like image 117
tommy_o Avatar answered Sep 28 '22 00:09

tommy_o


MRv1 which is also called as Hadoop 1 where the HDFS (Resource management and scheduling) and MapReduce(Programming Framework) are tightly coupled. Because of this non-batch applications can not be run on the hadoop 1. It has single namenode so, it doesn't provides high system availability and scalability.

MRv2 (aka Hadoop 2) in this version of hadoop the resource management and scheduling tasks are separated from MapReduce which is separated by YARN(Yet Another Resource Negotiator). The resource management and scheduling layer lies beneath the MapReduce layer. It also provides high system availability and scalability as we can create redundant NameNodes. The new feature of snapshot through which we can take backup of filesystems which helps disaster recovery.

like image 36
Ajit K'sagar Avatar answered Sep 28 '22 02:09

Ajit K'sagar