Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good hashCode() Implementation

Tags:

java

hash

The accepted answer in Best implementation for hashCode method gives a seemingly good method for finding Hash Codes. But I'm new to Hash Codes, so I don't quite know what to do.

For 1), does it matter what nonzero value I choose? Is 1 just as good as other numbers such as the prime 31?

For 2), do I add each value to c? What if I have two fields that are both a long, int, double, etc?


Did I interpret it right in this class:

public MyClass{
    long a, b, c; // these are the only fields
    //some code and methods
    public int hashCode(){
        return 37 * (37 * ((int) (a ^ (a >>> 32))) + (int) (b ^ (b >>> 32))) 
                 + (int) (c ^ (c >>> 32));
    }
}
like image 357
Justin Avatar asked May 18 '13 23:05

Justin


2 Answers

  1. The value is not important, it can be whatever you want. Prime numbers will result in a better distribution of the hashCode values therefore they are preferred.
  2. You do not necessary have to add them, you are free to implement whatever algorithm you want, as long as it fulfills the hashCode contract:
  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

There are some algorithms which can be considered as not good hashCode implementations, simple adding of the attributes values being one of them. The reason for that is, if you have a class which has two fields, Integer a, Integer b and your hashCode() just sums up these values then the distribution of the hashCode values is highly depended on the values your instances store. For example, if most of the values of a are between 0-10 and b are between 0-10 then the hashCode values are be between 0-20. This implies that if you store the instance of this class in e.g. HashMap numerous instances will be stored in the same bucket (because numerous instances with different a and b values but with the same sum will be put inside the same bucket). This will have bad impact on the performance of the operations on the map, because when doing a lookup all the elements from the bucket will be compared using equals().

Regarding the algorithm, it looks fine, it is very similar to the one that Eclipse generates, but it is using a different prime number, 31 not 37:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + (int) (a ^ (a >>> 32));
    result = prime * result + (int) (b ^ (b >>> 32));
    result = prime * result + (int) (c ^ (c >>> 32));
    return result;
}
like image 143
Adam Siemion Avatar answered Oct 21 '22 16:10

Adam Siemion


A well-behaved hashcode method already exists for long values - don't reinvent the wheel:

int hashCode = Long.hashCode((a * 31 + b) * 31 + c); // Java 8+

int hashCode = Long.valueOf((a * 31 + b) * 31 + c).hashCode() // Java <8

Multiplying by a prime number (usually 31 in JDK classes) and cumulating the sum is a common method of creating a "unique" number from several numbers.

The hashCode() method of Long keeps the result properly distributed across the int range, making the hash "well behaved" (basically pseudo random).

like image 21
Bohemian Avatar answered Oct 21 '22 16:10

Bohemian