Hadoop MapReduce: Clarification on number of reducers

Tags:

In the MapReduce framework, one reducer is used for each key generated by the mapper.

So you would think that specifying the number of Reducers in Hadoop MapReduce wouldn't make any sense because it's dependent on the program. However, Hadoop allows you to specify the number of reducers to use (-D mapred.reduce.tasks=# of reducers).

What does this mean? Is the parameter value for number of reducers specifying how many machine resources go to the reducers instead of the number of actual reducers used?

724

asked Mar 12 '14 18:03

Bryan

2 Answers

one reducer is used for each key generated by the mapper

This comment is not correct. One call to the reduce() method is done for each key grouped by the grouping comparator. A reducer (task) is a process that handles zero or more calls to reduce(). The property to which you refer is talking about the number of reducer tasks.

187

answered Sep 17 '22 18:09

Judge Mental

To simplify @Judge Mental's (very accurate) answer a little bit: A reducer task can work on many keys at a time, but the mapred.reduce.tasks=# parameter declares how many simultaneous reducer tasks will run for a specific job.

An example if your mapred.reduce.tasks=10:
You have 2,000 keys, each key with 50 values (for an evenly distributed 10,000 k:v pairs). Each reducer should be roughly handling 200 keys (1,000 k:v pairs).

An example if your mapred.reduce.tasks=20:
You have 2,000 keys, each key with 50 values (for an evenly distributed 10,000 k:v pairs). Each reducer should be roughly handling 100 keys (500 k:v pairs).

In the example above, the fewer keys each reducer has to work with, the faster the overall job will be ... so long as you have the available reducer resources in the cluster, of course.

answered Sep 18 '22 18:09

JamCon

Related questions
                            
                                Can I use Spark without Hadoop for development environment?
                            
                                What does "Client" exactly mean for Hadoop / HDFS?
                            
                                Can I submit an oozie job with multiple configuration files?
                            
                                Reading file as single record in hadoop
                            
                                Which HDFS operations are atomic?
                            
                                How to Best Run Hadoop on Single Machine?
                            
                                Converting IntWritatble to int
                            
                                GUI tools for viewing/editing Apache Parquet
                            
                                Hadoop MapReduce: Appropriate input files size?
                            
                                Hadoop - composite key
                            
                                How can i output hadoop result in csv format
                            
                                Apache Hadoop setXIncludeAware UnsupportedOperationException
                            
                                IOException: Filesystem closed exception when running oozie workflow
                            
                                Java: com.sun.tools.javac.Main not found when trying to compile Hadoop program
                            
                                Differences between Hadoop-common, Hadoop-core and Hadoop-client?
                            
                                overwrite hive partitions using spark
                            
                                Global variables in hadoop
                            
                                A way to export the results from Pig to a database
                            
                                Find the average of numbers using MapReduce
                            
                                How to use Hadoop InputFormats In Apache Spark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop MapReduce: Clarification on number of reducers

Tags:

hadoop

mapreduce

reducers

Bryan

People also ask

2 Answers

Judge Mental

JamCon

Recent Activity

Donate For Us