Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Same key different Reducers (HADOOP)?

is it possible to process values with same key on different reducers ? from all mappers i got data with same key and i want to process it with different reducers ? my confusion is that the book says all values with same key will go to same reducer ...

 mapper1(k1,v1),mapper2(k1,v2),mapper3(k1,v3) and so on...

i don't want all data to same reducer ...it should be like,

 reducer1(k1,v1),reducer2(k1,v2)....

and lets say reducer1 produce sum1 and reducer2 produce sum2 and i want that

 sum=sum2+sum1

how should i do that ?

like image 728
Divyendra Avatar asked Apr 23 '13 17:04

Divyendra


People also ask

Can we have multiple reducers in MapReduce?

If there are lot of key-values to merge, a single reducer might take too much time. To avoid reducer machine becoming the bottleneck, we use multiple reducers. When you have multiple reducers, each node that is running mapper puts key-values in multiple buckets just after sorting.

How do 2 reducers communicate with each other?

17) Can reducers communicate with each other? Reducers always run in isolation and they can never communicate with each other as per the Hadoop MapReduce programming paradigm.

How does Hadoop determine number of reducers?

The right number of reducers are 0.95 or 1.75 multiplied by (<no. of nodes> * <no. of the maximum container per node>). With 0.95, all reducers immediately launch and start transferring map outputs as the maps finish.

What is a reducer Hadoop?

In Hadoop, Reducer takes the output of the Mapper (intermediate key-value pair) process each of them to generate the output. The output of the reducer is the final output, which is stored in HDFS. Usually, in the Hadoop Reducer, we do aggregation or summation sort of computation.


1 Answers

Data with the same key will always go to the same reducer. But you can choose whatever key you want, so if you want them to go to different reducers, then just choose different keys.

If you want to do an additional combination based on the output from your reducers, then you must do another MapReduce job, with the output from the first job as the input to the next one. This can get ugly fast, so you may wish to look at Cascading, Pig, or Hive to simplify things.

like image 141
Joe K Avatar answered Sep 19 '22 00:09

Joe K