Chaining Multi-Reducers in a Hadoop MapReduce job

Tags:

Now I have a 4-phase MapReduce job as follows:

Input-> Map1 -> Reduce1 -> Reducer2 -> Reduce3 -> Reduce4 -> Output

I notice that there is ChainMapper class in Hadoop which can chain several mappers into one big mapper, and save the disk I/O cost between map phases. There is also a ChainReducer class, however it is not a real "Chain-Reducer". It can only support jobs like:

[Map+/ Reduce Map*]

I know I can set four MR jobs for my task, and use default mappers for the last three jobs. But that will cost a lot of disk I/O, since reducers should write the result into disk to let the following mapper access it. Is there any other Hadoop built-in feature to chain my reducers to lower the I/O cost?

I am using Hadoop 1.0.4.

571

asked Jun 01 '13 08:06

Yuhao

1 Answers

I dont think that you can have the o/p of a reducer being given to another reducer directly. I would have gone for this:

Input-> Map1 -> Reduce1 -> 
        Identity mapper -> Reducer2 -> 
                Identity mapper -> Reduce3 -> 
                         Identity mapper -> Reduce4 -> Output

In Hadoop 2.X series, internally you can chain mappers before reducer with ChainMapper and chain Mappers after reducer with ChainReducer.

185

answered Nov 15 '22 12:11

Tejas Patil

Related questions
                            
                                Copying files using apache fileutil.copyfile
                            
                                Is this naive equals, hashcode OK?
                            
                                Does @Transactional annotation work if autocommit is true?
                            
                                Why static initializer block not run in this simple case?
                            
                                How to share business logic among multiple applications
                            
                                How to deprioritize Java testrunner in Eclipse breakpoints?
                            
                                Can't find java.util.stream in java8
                            
                                Weak references and `OutOfMemoryError`s
                            
                                How to implement a ContentProvider for providing image to Gmail, Facebook, Evernote, etc
                            
                                JXMapKit/-Viewer extremely slow as webstartable - where to start digging?
                            
                                How can I create proxy without using spring AOP
                            
                                Convert void** pointer to equivalent Java type
                            
                                FileChannel ByteBuffer and Hashing Files
                            
                                Java generics - inferring nested type
                            
                                How to add namespace while signing XML file using javax.xml.crypto.dsig.*?
                            
                                What is the trade-off for disabling CSS in HTMLUnit?
                            
                                Setting property 'source' to 'org.eclipse.jst.jee.server:GestorContenidoWS' did not find a matching property. Try all the solutions
                            
                                Reflectively checking whether a object is a valid generic argument to a method
                            
                                org.codehaus.jackson.JsonParseException: Unexpected character ('' (code 65279 / 0xfeff)): expected a valid value
                            
                                Is Java annotation order persistent?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Chaining Multi-Reducers in a Hadoop MapReduce job

Tags:

java

hadoop

mapreduce

Yuhao

People also ask

1 Answers

Tejas Patil

Recent Activity

Donate For Us