Recently, I am reading the definitive guide of hadoop. I have two questions:
1.I saw a piece of code of one custom Partitioner:
public class KeyPartitioner extends Partitioner<TextPair, Text>{
@Override
public int getPartition(TextPair key, Text value, int numPartitions){
return (key.getFirst().hashCode()&Interger.MAX_VALUE)%numPartitions;
}
}
what does that mean for &Integer.MAX_VALUE? why should use & operator?
2.I also want write a custom Partitioner for IntWritable. So is it OK and best for key.value%numPartitions directly?
Like I already wrote in the comments, it is used to keep the resulting integer positive.
Let's use a simple example using Strings:
String h = "Hello I'm negative!";
int hashCode = h.hashCode();
hashCode
is negative with the value of -1937832979
.
If you would mod
this with a positive number (>0) that denotes the partition, the resulting number is always negative.
System.out.println(hashCode % 5); // yields -4
Since partitions can never be negative, you need to make sure the number is positive. Here comes a simple bit twiddeling trick into play, because Integer.MAX_VALUE
has all-ones execpt the sign bit (MSB in Java as it is big endian) which is only 1 on negative numbers.
So if you have a negative number with the sign bit set, you will always AND
it with the zero of the Integer.MAX_VALUE
which is always going to be zero.
You can make it more readable though:
return Math.abs(key.getFirst().hashCode() % numPartitions);
For example I have done that in Apache Hama's partitioner for arbitrary objects:
@Override
public int getPartition(K key, V value, int numTasks) {
return Math.abs(key.hashCode() % numTasks);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With