Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what does that mean for Text.hashCode() & Interger.MAX_VALUE?

Tags:

hadoop

Recently, I am reading the definitive guide of hadoop. I have two questions:

1.I saw a piece of code of one custom Partitioner:

public class KeyPartitioner extends Partitioner<TextPair, Text>{

    @Override
    public  int getPartition(TextPair key, Text value, int numPartitions){
        return (key.getFirst().hashCode()&Interger.MAX_VALUE)%numPartitions;
    }
}

what does that mean for &Integer.MAX_VALUE? why should use & operator?

2.I also want write a custom Partitioner for IntWritable. So is it OK and best for key.value%numPartitions directly?

like image 888
JoJo Avatar asked May 18 '13 06:05

JoJo


1 Answers

Like I already wrote in the comments, it is used to keep the resulting integer positive.

Let's use a simple example using Strings:

String h = "Hello I'm negative!";
int hashCode = h.hashCode();

hashCode is negative with the value of -1937832979.

If you would mod this with a positive number (>0) that denotes the partition, the resulting number is always negative.

System.out.println(hashCode % 5); // yields -4

Since partitions can never be negative, you need to make sure the number is positive. Here comes a simple bit twiddeling trick into play, because Integer.MAX_VALUE has all-ones execpt the sign bit (MSB in Java as it is big endian) which is only 1 on negative numbers.

So if you have a negative number with the sign bit set, you will always AND it with the zero of the Integer.MAX_VALUE which is always going to be zero.

You can make it more readable though:

return Math.abs(key.getFirst().hashCode() % numPartitions);

For example I have done that in Apache Hama's partitioner for arbitrary objects:

 @Override
 public int getPartition(K key, V value, int numTasks) {
    return Math.abs(key.hashCode() % numTasks);
 }
like image 77
Thomas Jungblut Avatar answered Nov 08 '22 17:11

Thomas Jungblut