Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between classic, local for mapreduce.framework.name in mapred-site.xml?

The official description for this parameter is as follows:

The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.

I know that the value 'yarn' is for MRv2, which will submit the mapreduce job to resourcemanager. But what is the difference between local and classic? Which one is corresponding to MRv1?

Thanks a lot!

like image 314
Judking Avatar asked Nov 01 '14 07:11

Judking


People also ask

What is Mapred site XML?

The mapred-site. xml file contains the configuration settings for MapReduce daemons; the job tracker and the task-trackers.

What is map and what is reducer in Hadoop?

Hadoop Mapper is a function or task which is used to process all input records from a file and generate the output which works as input for Reducer. It produces the output by returning new key-value pairs.

How do I write a MapReduce program in Hadoop?

The reducer uses the right data types specific to Hadoop MapReduce (line 50-52). The reduce (Object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs. The output of the reduce task is written to a RecordWriter via TaskInputOutputContext. write(Object, Object) (line 54-56).

How does Hadoop know how many mappers has to be started?

Consider, hadoop system has default 128 MB as split data size. Then, hadoop will store the 1 TB data into 8 blocks (1024 / 128 = 8 ). So, for each processing of this 8 blocks i.e 1 TB of data , 8 mappers are required. Firstly it depends on if the files can be split by Hadoop (splittable) or not.


2 Answers

You are right, "yarn" stands for MRv2. "classic" is for MRv1 and "local" for local runs of the MR jobs. But why do you need MRv1? Yarn is out of the beta now and it is more stable than old MRv1 framework, while your MapReduce jobs still can use old "mapred" API

like image 173
0x0FFF Avatar answered Oct 03 '22 19:10

0x0FFF


I agree with the above answer, I would like to add one more point.

Classic is MR1, whenever we submit MR jobs using framework name as classic, Job will be submitted to Jobtracker daemon in MR1 which will coordinate the mapreduce execution, each tasks will be executed in different JVMs

Whereas the main purpose of local JobRunner is debugging/testing mapreduce program with small inputs. It doesn't need any daemons like JobTracker, TaskTracker. This execution mode is useful when you execute any MR application from eclipse, by default execution will be in local jobrunner(Uses the same JVM that eclipse use). All mapper/reduce will be executing the same JVM. As the same JVM is being used for all executions(Driver+Map+Reduce), one cannot use this for processing large data, execution will end up with OutOfMemory exception.

like image 40
SachinJ Avatar answered Oct 03 '22 19:10

SachinJ