Number of reducers in hadoop

Question

I was learning hadoop, I found number of reducers very confusing :

1) Number of reducers is same as number of partitions.

2) Number of reducers is 0.95 or 1.75 multiplied by (no. of nodes) * (no. of maximum containers per node).

3) Number of reducers is set by mapred.reduce.tasks.

4) Number of reducers is closest to: A multiple of the block size * A task time between 5 and 15 minutes * Creates the fewest files possible.

I am very confused, Do we explicitly set number of reducers or it is done by mapreduce program itself?

How is number of reducers is calculated? Please tell me how to calculate number of reducers.

ViKiG · Accepted Answer

1 - The number of reducers is as number of partitions - False. A single reducer might work on one or more partitions. But a chosen partition will be fully done on the reducer it is started.

2 - That is just a theoretical number of maximum reducers you can configure for a Hadoop cluster. Which is very much dependent on the kind of data you are processing too (decides how much heavy lifting the reducers are burdened with).

3 - The mapred-site.xml configuration is just a suggestion to the Yarn. But internally the ResourceManager has its own algorithm running, optimizing things on the go. So that value is not really the number of reducer tasks running every time.

4 - This one seems a bit unrealistic. My block size might 128MB and everytime I can't have 128*5 minimum number of reducers. That's again is false, I believe.

There is no fixed number of reducers task that can be configured or calculated. It depends on the moment how much of the resources are actually available to allocate.

user3484461 · Answer

Number of reducer is internally calculated from size of the data we are processing if you don't explicitly specify using below API in driver program

job.setNumReduceTasks(x)

By default on 1 GB of data one reducer would be used.

so if you are playing with less than 1 GB of data and you are not specifically setting the number of reducer so 1 reducer would be used .

Similarly if your data is 10 Gb so 10 reducer would be used .

You can change the configuration as well that instead of 1 GB you can specify the bigger size or smaller size.

property in hive for setting size of reducer is :

hive.exec.reducers.bytes.per.reducer

you can view this property by firing set command in hive cli.

Partitioner only decides which data would go to which reducer.

red · Answer

Your job may or may not need reducers, it depends on what are you trying to do. When there are multiple reducers, the map tasks partition their output, each creating one partition for each reduce task. There can be many keys (and their associated values) in each partition, but the records for any given key are all in a single partition. One rule of thumb is to aim for reducers that each run for five minutes or so, and which produce at least one HDFS block’s worth of output. Too many reducers and you end up with lots of small files.

gunner87 · Answer

Partitioner makes sure that same keys from multiple mappers goes to the same reducer. This doesn't mean that number of partitions is equal to number of reducers. However, you can specify number of reduce tasks in the driver program using job instance like job.setNumReduceTasks(2). If you don't specify the number of reduce tasks in the driver program then it picks from the mapred.reduce.tasks which has the default value of 1 (https://hadoop.apache.org/docs/r1.0.4/mapred-default.html) i.e. all mappers output will go to the same reducer.

Also, note that programmer will not have control over number of mappers as it depends on the input split where as programmer can control the number of reducers for any job.

Number of reducers in hadoop

Tags:

hadoop

hadoop2

mapreduce

reducers

bigdata

Mohit Jain

4 Answers

ViKiG

user3484461

red

gunner87

Recent Activity

Donate For Us

Number of reducers in hadoop

Tags:

hadoop

hadoop2

mapreduce

reducers

bigdata

Mohit Jain

4 Answers

ViKiG

user3484461

red

gunner87

Related questions

Recent Activity

Donate For Us