Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combiner and reducer can be different?

In many MapReduce programs, I see a reducer being used as a combiner as well. I know this is because of the specific nature of those programs. But I am wondering if they can be different.

like image 669
kee Avatar asked Jul 31 '12 01:07

kee


People also ask

Is combiner same as reducer?

Both Reducer and Combiner are conceptually the same thing. The difference is when and where they are executed. A Combiner is executed (optionally) after the Mapper phase in the same Node which runs the Mapper. So there is no Network I/O involved.

How is combiner different from reducer explain with example?

A Combiner, also known as a semi-reducer, is an optional class that operates by accepting the inputs from the Map class and thereafter passing the output key-value pairs to the Reducer class. The main function of a Combiner is to summarize the map output records with the same key.

Is a combiner function a replacement of the reduce function?

Combiner function is used as per requirements, it has not replaced Reducer.

Does combiner help in reducing number of key value pairs?

Combiners run after mapper to reduce the key value pair counts of mapper output. It used for the purpose of optimization and hence decreases the network overload during shuffling process. Combiner performs the same aggregation operation as a reducer.


2 Answers

Yes, a combiner can be different to the Reducer, although your Combiner will still be implementing the Reducer interface. Combiners can only be used in specific cases which are going to be job dependent. The Combiner will operate like a Reducer, but only on the subset of the Key/Values output from each Mapper.

One constraint that your Combiner will have, unlike a Reducer, is that the input/output key and value types must match the output types of your Mapper.

like image 53
Binary Nerd Avatar answered Oct 29 '22 14:10

Binary Nerd


The primary goal of combiners is to optimize/minimize the number of key value pairs that will be shuffled across the network between mappers and reducers and thus to save as most bandwidth as possible.

The thumb rule of combiner is it has to have the same input and output variable types, the reason for this, is combiner use is not guaranteed, it can or can not be used , depending the volume and number of spills.

The reducer can be used as a combiner when it satisfies this rule i.e. same input and output variable type.

The other most important rule for combiner is it can only be used when the function you want to apply is both commutative and associative. like adding numbers .But not in case like average(if u r using same code as reducer).

Now to answer your question, yes off course they can be different, and when your reducer has different type of input , and output variables, u have no choice , but to make a different copy of ur reducer code and modifying it.

If u r concerned about the logic of the reducer , that you can implement in a different way as well, say in case of a combiner you can have a collection object to have a local buffer of all the values coming to the combiner, this is less risky than using it in reducer, because in case of reducer , it is more prone to go out of memory than in combiner. other logic differences can certainly exist and does.

like image 45
user3123372 Avatar answered Oct 29 '22 14:10

user3123372