Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between MR1 and MR2?

I want to know the detailed difference between mapreduce 1 and mapreduce2. What does the inclusion of YARN actually add to Hadoop? I am a beginner who wants to learn Apache Hadoop. Can anyone suggest where to begin from. Also what is the cluster setup of Hadoop. Thank You for all the help.

like image 877
Arunima Joshi Avatar asked Dec 21 '22 00:12

Arunima Joshi


2 Answers

With Hadoop2 Apache separated the management of the map/reduce process from the cluster's resource management (YARN = the new resource manager). The separation allows two things for one specialization i.e. YARN is a better resource manger than we had in MR1. It also enables versatility - the resource manager can support additional paradigms and not just map/reduce and indeed we see a whole lot of stuff that YARN can manage like Tez, Hama, Storm and even HBase

You can checkout HortonWorks YARN page as a good starting point to understand what Yarn is and what it does

like image 61
Arnon Rotem-Gal-Oz Avatar answered Dec 22 '22 13:12

Arnon Rotem-Gal-Oz


MR1 architecture, the cluster was managed by a service called the JobTracker. TaskTracker services lived on each node and would launch tasks on behalf of jobs. The JobTracker would serve information about completed jobs.

MR2 architecture,the old MR1 framework was rewritten to run within a submitted application on top of YARN. This application was christened MR2, or MapReduce version 2. It is the familiar MapReduce execution underneath, except that each job now controls its own destiny via its own ApplicationMaster taking care of execution flow (such as scheduling tasks, handling speculative execution and failures, etc.)

Reference:-http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/

like image 23
Ankit Singhal Avatar answered Dec 22 '22 12:12

Ankit Singhal