Is there a way in Hadoop to ensure that every reducer gets only one key that is output by the mapper ?
This question is a bit unclear for me. But I think I have a pretty good idea what you want.
First of all if you do nothing special every time a reduce is called it gets only one single key with a set of one or more values (via an iterator).
My guess is that you want to ensure that every reducer gets exactly one 'key-value pair'. There are essentially two ways of doing that:
So if I understand your question correctly. You should implement a GroupComparator that simply states that all keys are different and should therefor be sent to a different reducer call.
Because of other answers in this question I'm adding a bit more detail:
There are 3 methods used for comparing keys (I pulled these code samples from a project I did using the 0.18.3 API):
Partitioner
conf.setPartitionerClass(KeyPartitioner.class);
The partitioner is only to ensure that "things that must be the same end up on the same partition". If you have 1 computer there is only one partition, so this won't help much.
Key Comparator
conf.setOutputKeyComparatorClass(KeyComparator.class);
The key comparator is used to SORT the "key-value pairs" in a group by looking at the key ... which must be different somehow.
Group Comparator
conf.setOutputValueGroupingComparator(GroupComparator.class);
The group comparator is used to group keys that are different, yet must be sent o the same reducer.
HTH
You can get some control over which keys get sent to which reducers by implementng the Partitioner interface
From the Hadoop API docs:
Partitioner controls the partitioning of the keys of the intermediate map-outputs. The key (or a subset of the key) is used to derive the partition, typically by a hash function. The total number of partitions is the same as the number of reduce tasks for the job. Hence this controls which of the m reduce tasks the intermediate key (and hence the record) is sent for reduction.
The following book does a great job of describing partitioning, key sorting strategies and tradeoffs along with other issues in map reduce algorithm design: http://www.umiacs.umd.edu/~jimmylin/book.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With