Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

“Combiner" Class in a mapreduce job

A Combiner runs after the Mapper and before the Reducer,it will receive as input all data emitted by the Mapper instances on a given node. then emits output to the Reducers.

And also,If a reduce function is both commutative and associative, then it can be used as a Combiner.

My Question is what does the phrase "commutative and associative" mean in this situation?

like image 472
wayen wan Avatar asked Apr 19 '12 01:04

wayen wan


People also ask

What is combiner and partitioning in MapReduce?

The difference between a partitioner and a combiner is that the partitioner divides the data according to the number of reducers so that all the data in a single partition gets executed by a single reducer. However, the combiner functions similar to the reducer and processes the data in each partition.

What is the use of combiner in Hadoop?

Hadoop Combiner reduces the time taken for data transfer between mapper and reducer. It decreases the amount of data that needed to be processed by the reducer. The Combiner improves the overall performance of the reducer.

Is a combiner function a replacement of the reduce function?

Combiner function is used as per requirements, it has not replaced Reducer.


1 Answers

Assume you have a list of numbers, 1 2 3 4 5 6.

Associative here means you can take your operation and apply it to any subgroup, then apply it to the result of those and get the same answer:

(1) + (2 + 3) + (4 + 5 + 6)
  ==
(1 + 2) + (3 + 4) + (5) + (6)
  ==
...

Think of the parenthesis here as the execution of a combiner.

Commutative means that the order doesn't matter, so:

1 + 2 + 3 + 4 + 5 + 6
  ==
2 + 4 + 6 + 1 + 2 + 3
  ==
...

For example, addition, fits this property, as seen before. "Maximum" fits this property above as well, because the max of maxs is the max. max(a,b) == max(b,a).

Median is an example that doesn't work: the median of medians is not the true median.


Don't forget another important property of a combiner: the input types for the key/value and the output types of the key/value need to be the same. For example, you can't take in a string:int and return a string:float.

Often times, the reducer might output some sort of string instead of numerical value, which may prevent you from just plugging in your reducer as the combiner.

like image 100
Donald Miner Avatar answered Dec 06 '22 00:12

Donald Miner