Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chaining Multi-Reducers in a Hadoop MapReduce job

Now I have a 4-phase MapReduce job as follows:

Input-> Map1 -> Reduce1 -> Reducer2 -> Reduce3 -> Reduce4 -> Output

I notice that there is ChainMapper class in Hadoop which can chain several mappers into one big mapper, and save the disk I/O cost between map phases. There is also a ChainReducer class, however it is not a real "Chain-Reducer". It can only support jobs like:

[Map+/ Reduce Map*]

I know I can set four MR jobs for my task, and use default mappers for the last three jobs. But that will cost a lot of disk I/O, since reducers should write the result into disk to let the following mapper access it. Is there any other Hadoop built-in feature to chain my reducers to lower the I/O cost?

I am using Hadoop 1.0.4.

like image 571
Yuhao Avatar asked Jun 01 '13 08:06

Yuhao


People also ask

Can we have multiple reducers in MapReduce?

If there are lot of key-values to merge, a single reducer might take too much time. To avoid reducer machine becoming the bottleneck, we use multiple reducers. When you have multiple reducers, each node that is running mapper puts key-values in multiple buckets just after sorting.

Which class is used to invoke multiple reducers in a MapReduce job?

setReducerClass in each main class you set each Reducer.

Can you provide multiple input paths to a MapReduce jobs?

We use MultipleInputs class which supports MapReduce jobs that have multiple input paths with a different InputFormat and Mapper for each path.

How many reducers run for a MapReduce job?

The default number of reducers for any job is 1. The number of reducers can be set in the job configuration.


1 Answers

I dont think that you can have the o/p of a reducer being given to another reducer directly. I would have gone for this:

Input-> Map1 -> Reduce1 -> 
        Identity mapper -> Reducer2 -> 
                Identity mapper -> Reduce3 -> 
                         Identity mapper -> Reduce4 -> Output

In Hadoop 2.X series, internally you can chain mappers before reducer with ChainMapper and chain Mappers after reducer with ChainReducer.

like image 185
Tejas Patil Avatar answered Nov 15 '22 12:11

Tejas Patil