I have written a custom partitioner. When I have number of reduce tasks greater than 1, the job is failing. This is the exception which I'm getting:
java.io.IOException: Illegal partition for weburl_compositeKey@804746b1 (-1)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:930)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:499)
The code which I have written is
public int getPartition(weburl_compositeKey key, Text value, int numPartitions)
{
return (key.hashCode()) % numPartitions;
}
This the key.hashCode()
equals -719988079
and mod of this value is returning -1
.
Appreciate your help on this. Thanks.
The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition . It takes key.
A partitioner partitions the key-value pairs of intermediate Map-outputs. It partitions the data using a user-defined condition, which works like a hash function. The total number of partitions is same as the number of Reducer tasks for the job.
Custom Partitioner is a process that allows you to store the results in different reducers, based on the user condition. By setting a partitioner to partition by the key, we can guarantee that, records for the same key will go to the same reducer.
Hadoop Partitioning specifies that all the values for each key are grouped together. It also makes sure that all the values of a single key go to the same reducer. This allows even distribution of the map output over the reducer.
The calculated partition number by your custom Partitioner
has to be non-negative. Try:
public int getPartition(weburl_compositeKey key, Text value, int numPartitions)
{
return (key.hashCode() & Integer.MAX_VALUE) % numPartitions;
}
A warning about using:
public int getPartition(weburl_compositeKey key, Text value, int numPartitions)
{
return Math.abs(key.hashCode()) % numPartitions;
}
If you hit the case where the key.hashCode()
is equal to Integer.MIN_VALUE
you will still get a negative partition value. It is an oddity of Java, but Math.abs(Integer.MIN_VALUE)
returns Integer.MIN_VALUE
( as in -2147483648). You are safer taking the absolute value of the modulus, as in:
public int getPartition(weburl_compositeKey key, Text value, int numPartitions)
{
return Math.abs(key.hashCode() % numPartitions);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With