I noticed that there are two sets of Hadoop configuration parameters: one with mapred.* and the other with mapreduce.. I am guessing these might be due to old API vs. new API but if I am not mistaken, these seem to coexist in the new API. Am I correct? If so, is there a generalized statement what is used for mapred. and what is for mapreduce.*?
MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the heart of Apache Hadoop. The term "MapReduce" refers to two separate and distinct tasks that Hadoop programs perform.
MapReduce facilitates concurrent processing by splitting petabytes of data into smaller chunks, and processing them in parallel on Hadoop commodity servers. In the end, it aggregates all the data from multiple servers to return a consolidated output back to the application.
In Hadoop, Reducer takes the output of the Mapper (intermediate key-value pair) process each of them to generate the output. The output of the reducer is the final output, which is stored in HDFS. Usually, in the Hadoop Reducer, we do aggregation or summation sort of computation.
Examining the source for 0.20.2, there are only a few mapreduce.*
properties, and they revolve around configuring the job input/output format, mapper/combiner/reducer and partitioner classes (they also signal to the job client that the new API is being used by the user - look through the source for o.a.h.mapreduce.Job
, setUseNewAPI()
method)
mapreduce.inputformat.class
mapreduce.outputformat.class
mapreduce.partitioner.class
mapreduce.map.class
mapreduce.combine.class
mapreduce.reduce.class
There are some more properties but they are secondary configuration
The input and output formats, whether it be new or old API versions, typically use mapred.*
properties
For example, the signal your map reduce input paths you use mapred.input.dir
(whether you're using the new or old API). Same for the output property mapred.output.dir
So the long and the short of if is, if there isn't a utility method to configure the property (FileInputFormat.setInputPaths(Job, String)
) then you'll need to check the source
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With