Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Hadoop, are my reducers guaranteed to get all the records with the same key?

I'm running a Hadoop job using Hive actually that is supposed to uniq lines in many text files. In the reduce step, it chooses the most recently timestamped record for each key.

Does Hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if many reducers are running across a cluster?

I worry that the mapper output might be split after the shuffle happens in the middle of a set of records with the same key.

like image 832
samg Avatar asked Apr 13 '10 21:04

samg


People also ask

What determines the number of reducers of a MapReduce job?

of Mappers per MapReduce job:The number of mappers depends on the amount of InputSplit generated by trong>InputFormat (getInputSplits method). If you have 640MB file and Data Block size is 128 MB then we need to run 5 Mappers per MapReduce job.

Can we have multiple reducers in MapReduce?

If there are lot of key-values to merge, a single reducer might take too much time. To avoid reducer machine becoming the bottleneck, we use multiple reducers. When you have multiple reducers, each node that is running mapper puts key-values in multiple buckets just after sorting.

Is it possible to start reducers While some mappers are still running?

A reducer cannot start while a mapper is still in progress. All the map output values that have the same key are assigned to a single reducer, which then aggregates the values for that key.

What will happen if the reducers are set to zero in the job configuration?

If we set the number of Reducer to 0 (by setting job. setNumreduceTasks(0)), then no reducer will execute and no aggregation will take place. In such case, we will prefer “Map-only job” in Hadoop. In Map-Only job, the map does all task with its InputSplit and the reducer do no job.


1 Answers

All values for a key are sent to the same reducer. See this Yahoo! tutorial for more discussion.

This behavior is determined by the partitioner, and might not be true if you use a partitioner other than the default.

like image 91
Karl Anderson Avatar answered Sep 28 '22 09:09

Karl Anderson