Partitioning! how does hadoop make it? Use a hash function? what is the default function?

Tags:

Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, value) pairs which reducer will receive them. It is necessary that for any key, regardless of which mapper instance generated it, the destination partition is the same Problem: How does hadoop make it? Use a hash function? what is the default function?

269

asked Aug 27 '13 16:08

cherri_zj

1 Answers

The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition. It takes key.hashCode() & Integer.MAX_VALUE and finds the modulus using the number of reduce tasks.

For example, if there are 10 reduce tasks, getPartition will return values 0 through 9 for all keys.

Here is the code:

public class HashPartitioner<K, V> extends Partitioner<K, V> {
    public int getPartition(K key, V value, int numReduceTasks) {
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
    }
}

To create a custom partitioner, you would extend Partitioner, create a method getPartition, then set your partitioner in the driver code (job.setPartitionerClass(CustomPartitioner.class);). This is particularly helpful if doing secondary sort operations, for example.

122

answered Sep 28 '22 08:09

tommy_o

Related questions
                            
                                hdfs copy multiple files to same target directory
                            
                                Hadoop streaming job failure: Task process exit with nonzero status of 137
                            
                                finding mean using pig or hadoop
                            
                                Merging multiple sequence files into one sequencefile within Hadoop
                            
                                Hadoop and Amazon Web Services [closed]
                            
                                Map Reduce output to CSV or do I need Key Values?
                            
                                What kind of JBOD in hadoop? and COW with hadoop?
                            
                                How to set the VCORES in hadoop mapreduce/yarn?
                            
                                HIVE Insert overwrite into a partitioned Table
                            
                                How can I check the settings in hive CLI?
                            
                                Why declaring Mapper and Reducer classes as static?
                            
                                AWS EMR performance HDFS vs S3
                            
                                Usecases for mapred.job.queue.name
                            
                                Hadoop commands
                            
                                Hadoop Name Node format fails
                            
                                How to access files in Hadoop HDFS?
                            
                                Hadoop streaming grep does not work
                            
                                Convert "3" to 3 with PigLatin
                            
                                Wrong key class: Text is not IntWritable
                            
                                Why is it keep showing deprecated error when running hadoop (or dfs command)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Partitioning! how does hadoop make it? Use a hash function? what is the default function?

Tags:

hash

hadoop

partitioning

cherri_zj

People also ask

1 Answers

tommy_o

Recent Activity

Donate For Us