The official description for this parameter is as follows:
The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn.
I know that the value 'yarn' is for MRv2, which will submit the mapreduce job to resourcemanager. But what is the difference between local and classic? Which one is corresponding to MRv1?
Thanks a lot!
The mapred-site. xml file contains the configuration settings for MapReduce daemons; the job tracker and the task-trackers.
Hadoop Mapper is a function or task which is used to process all input records from a file and generate the output which works as input for Reducer. It produces the output by returning new key-value pairs.
The reducer uses the right data types specific to Hadoop MapReduce (line 50-52). The reduce (Object, Iterable, Context) method is called for each <key, (collection of values)> in the sorted inputs. The output of the reduce task is written to a RecordWriter via TaskInputOutputContext. write(Object, Object) (line 54-56).
Consider, hadoop system has default 128 MB as split data size. Then, hadoop will store the 1 TB data into 8 blocks (1024 / 128 = 8 ). So, for each processing of this 8 blocks i.e 1 TB of data , 8 mappers are required. Firstly it depends on if the files can be split by Hadoop (splittable) or not.
You are right, "yarn" stands for MRv2. "classic" is for MRv1 and "local" for local runs of the MR jobs. But why do you need MRv1? Yarn is out of the beta now and it is more stable than old MRv1 framework, while your MapReduce jobs still can use old "mapred" API
I agree with the above answer, I would like to add one more point.
Classic is MR1, whenever we submit MR jobs using framework name as classic, Job will be submitted to Jobtracker daemon in MR1 which will coordinate the mapreduce execution, each tasks will be executed in different JVMs
Whereas the main purpose of local JobRunner is debugging/testing mapreduce program with small inputs. It doesn't need any daemons like JobTracker, TaskTracker. This execution mode is useful when you execute any MR application from eclipse, by default execution will be in local jobrunner(Uses the same JVM that eclipse use). All mapper/reduce will be executing the same JVM. As the same JVM is being used for all executions(Driver+Map+Reduce), one cannot use this for processing large data, execution will end up with OutOfMemory exception.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With