Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hadoop: difference between 0 reducer and identity reducer?

Tags:

I am just trying to confirm my understanding of difference between 0 reducer and identity reducer.

  • 0 reducer means reduce step will be skipped and mapper output will be the final out
  • Identity reducer means then shuffling/sorting will still take place?
like image 230
kee Avatar asked May 17 '12 05:05

kee


People also ask

What happens if number of reducers is 0 in Hadoop?

If we set the number of Reducer to 0 (by setting job. setNumreduceTasks(0)), then no reducer will execute and no aggregation will take place. In such case, we will prefer “Map-only job” in Hadoop. In Map-Only job, the map does all task with its InputSplit and the reducer do no job.

What is identity reducer?

Identity Reducer is the default reducer in Hadoop old API. When no reducer class is set by job. setReducerClass() method in Driver class, Identity reducer is used as the default reducer. It doesn't provide any processing on the input, it will flush whatever input key value pair is fed to it as output.

Can we set reducer to zero?

Yes, we can set the Number of Reducer to zero. This means it is map only. The data is not sorted and directly stored in HDFS. If we want the output from mapper to be sorted ,we can use Identity reducer.


1 Answers

You understanding is correct. I would define it as following: If you do not need sorting of map results - you set 0 reduced,and the job is called map only.
If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer.
And to complete the picture we have a third case : we do need aggregation and, in this case we need reducer.

like image 110
David Gruzman Avatar answered Sep 18 '22 17:09

David Gruzman