I think I have a fair understanding of the MapReduce programming model in general, but even after reading the original paper and some other sources many details are unclear to me, especially regarding the partitioning of the intermediate results. I will quickly summarize my understanding of MapReduce so far: We have a potentially very large input data set, which is automatically split up into M different pieces by the MR-Framework. For each piece, the framework schedules one map task which is executed by one of the available processors/machines in my cluster. Each of the M map tasks outputs a set of Key-Value-Pairs, which is stored locally on the same machine that executed this map task. Each machine divides its disk into R partitions and distributes its computed intermediate key value pairs based on the intermediate keys among the partitions. Then, the framework starts for each distinct intermediate key one reduce task which is again executed by any of the available machines. Now my questions are: <ol> <li>In some tutorials it sounds like there could be map and reduce tasks executed in parallel. Is this right? How could that be, assuming that for each distinct intermediate key only one reduce task is started? Do we not have to wait until the last map task is finished before we can start the first reduce task?</li> <li>As we have one reduce task per distinct intermediate key, is it right that each reduce task requires the executing machine to load the corresponding partition from every other machine? Potentially, every machine can have a key-value-pair with the desired intermediate key, so for each reduce task we potentially have to query all other machines. Is that really efficient?</li> <li>The original paper says that the number of partitions (R) is specified by the user. But isn’t a partition the input for a reduce task? Or more exactly: Isn’t the union of all partitions with the same number among all machines the input of one reduce task? That would mean, that R depends on the number of distinct intermediate keys which the user usually doesn’t know.</li> </ol> Conceptually it is clear what the input and outputs of the map and reduce functions/tasks are. But I think I haven’t yet understood MapReduce on the technical level. Could somebody please help me understanding?

<ol> <li>You can start the reducer tasks while the map tasks are still running (using a feature known as slowstart), but the reducers can only run the copy phase (acquiring the completed results from the completed map tasks. It will need to wait for all the mappers to complete before it can actually perform the final sort and reduce.</li> <li>A reduce task actually processes zero, one or more keys (rather than a discrete tasks for each key). Each reducer will need to acquire the map output from each map task that relates to its partition before these intermediate outputs are sorted and then reduced one key set at a time.</li> <li>Back to the note in 2 - a reducer task (one for each partition) runs on zero, one or more keys rather than a single task for each discrete key. </li> </ol> It's also important to understand the spread and variation of your intermediate key as it is hashed and modulo'd (if using the default HashPartitioner) to determine which reduce partition should process that key. Say you had an even number of reducer tasks (10), and output keys that always hashed to an even number - then in this case the modulo of these hashs numbers and 10 will always be an even number, meaning that the odd numbered reducers would never process any data.

How does partitioning in MapReduce exactly work?

Tags:

hadoop

mapreduce

I think I have a fair understanding of the MapReduce programming model in general, but even after reading the original paper and some other sources many details are unclear to me, especially regarding the partitioning of the intermediate results.

I will quickly summarize my understanding of MapReduce so far: We have a potentially very large input data set, which is automatically split up into M different pieces by the MR-Framework. For each piece, the framework schedules one map task which is executed by one of the available processors/machines in my cluster. Each of the M map tasks outputs a set of Key-Value-Pairs, which is stored locally on the same machine that executed this map task. Each machine divides its disk into R partitions and distributes its computed intermediate key value pairs based on the intermediate keys among the partitions. Then, the framework starts for each distinct intermediate key one reduce task which is again executed by any of the available machines.

Now my questions are:

In some tutorials it sounds like there could be map and reduce tasks executed in parallel. Is this right? How could that be, assuming that for each distinct intermediate key only one reduce task is started? Do we not have to wait until the last map task is finished before we can start the first reduce task?
As we have one reduce task per distinct intermediate key, is it right that each reduce task requires the executing machine to load the corresponding partition from every other machine? Potentially, every machine can have a key-value-pair with the desired intermediate key, so for each reduce task we potentially have to query all other machines. Is that really efficient?
The original paper says that the number of partitions (R) is specified by the user. But isn’t a partition the input for a reduce task? Or more exactly: Isn’t the union of all partitions with the same number among all machines the input of one reduce task? That would mean, that R depends on the number of distinct intermediate keys which the user usually doesn’t know.

Conceptually it is clear what the input and outputs of the map and reduce functions/tasks are. But I think I haven’t yet understood MapReduce on the technical level. Could somebody please help me understanding?

331

asked Jul 18 '13 21:07

user1494080

2 Answers

You can start the reducer tasks while the map tasks are still running (using a feature known as slowstart), but the reducers can only run the copy phase (acquiring the completed results from the completed map tasks. It will need to wait for all the mappers to complete before it can actually perform the final sort and reduce.
A reduce task actually processes zero, one or more keys (rather than a discrete tasks for each key). Each reducer will need to acquire the map output from each map task that relates to its partition before these intermediate outputs are sorted and then reduced one key set at a time.
Back to the note in 2 - a reducer task (one for each partition) runs on zero, one or more keys rather than a single task for each discrete key.

It's also important to understand the spread and variation of your intermediate key as it is hashed and modulo'd (if using the default HashPartitioner) to determine which reduce partition should process that key. Say you had an even number of reducer tasks (10), and output keys that always hashed to an even number - then in this case the modulo of these hashs numbers and 10 will always be an even number, meaning that the odd numbered reducers would never process any data.

103

answered Oct 04 '22 16:10

Chris White

Addendum to what Chris said,

Basically, a partitioner class in Hadoop (e.g. Default HashPartitioner)

has to implement this function,

int getPartition(K key, V value, int numReduceTasks)

This function is responsible for returning you the partition number and you get the number of reducers you fixed when starting the job from the numReduceTasks variable, as seen for in the HashPartitioner.

Based on what integer the above function return, Hadoop selects node where the reduce task for a particular key should run.

Hope this helps.

answered Oct 04 '22 16:10

SSaikia_JtheRocker

Related questions
                            
                                Wildcard in Hadoop's FileSystem listing API calls
                            
                                Spark RDD - is partition(s) always in RAM?
                            
                                Rename directory in hdfs
                            
                                Standalone Manager Vs. Yarn Vs. Mesos
                            
                                Error Message: TOK_ALLCOLREF is not supported in current context - while Using DISTINCT in HIVE
                            
                                Why can't hive recognize alias named in select part?
                            
                                How do I determine the size of my HBase Tables ?. Is there any command to do so?
                            
                                How can I include a python package with Hadoop streaming job?
                            
                                Spark can access Hive table from pyspark but not from spark-submit
                            
                                What is the difference between Apache Pig and Apache Hive?
                            
                                Writing data to Hadoop
                            
                                How to find cdh version hadoop
                            
                                How to add partition using hive by a specific date?
                            
                                how to write subquery and use "In" Clause in Hive
                            
                                Hadoop "Permission denied (publickey,password,keyboard-interactive)" warning
                            
                                Distributed local clustering coefficient algorithm (MapReduce/Hadoop)
                            
                                R Hive Thrift Client
                            
                                Yarn MapReduce Job Issue - AM Container launch error in Hadoop 2.3.0
                            
                                Very basic question about Hadoop and compressed input files
                            
                                Spark 2.0 deprecates 'DirectParquetOutputCommitter', how to live without it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With