I want to know the detailed difference between mapreduce 1 and mapreduce2. What does the inclusion of YARN actually add to Hadoop? I am a beginner who wants to learn Apache Hadoop. Can anyone suggest where to begin from. Also what is the cluster setup of Hadoop. Thank You for all the help.
With Hadoop2 Apache separated the management of the map/reduce process from the cluster's resource management (YARN = the new resource manager). The separation allows two things for one specialization i.e. YARN is a better resource manger than we had in MR1. It also enables versatility - the resource manager can support additional paradigms and not just map/reduce and indeed we see a whole lot of stuff that YARN can manage like Tez, Hama, Storm and even HBase
You can checkout HortonWorks YARN page as a good starting point to understand what Yarn is and what it does
MR1 architecture, the cluster was managed by a service called the JobTracker. TaskTracker services lived on each node and would launch tasks on behalf of jobs. The JobTracker would serve information about completed jobs.
MR2 architecture,the old MR1 framework was rewritten to run within a submitted application on top of YARN. This application was christened MR2, or MapReduce version 2. It is the familiar MapReduce execution underneath, except that each job now controls its own destiny via its own ApplicationMaster taking care of execution flow (such as scheduling tasks, handling speculative execution and failures, etc.)
Reference:-http://blog.cloudera.com/blog/2013/11/migrating-to-mapreduce-2-on-yarn-for-operators/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With