I have three hashCode methods as follows, I prioritised them based on their efficiency. I am wondering if there is any other way to make a more efficient hashCode method.
1) public int hashCode() { //terrible
return 5;
}
2) public int hashCode() { //a bit less terrible
return name.length;
}
3) public int hashCode() { //better
final int prime = 31;
int result = 1;
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
Use wrappers for composite HashMap keys Whenever a HashMap has composite String keys, use a wrapper instead of concatenating the strings to make a key. Doing so will make the lookup much faster and reduce allocation rate, as the benchmark below demonstrates.
The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.
HashMap's Bottleneck Because non-equal objects can have the same hash codes (a phenomenon called hash code collision), buckets can grow in size. The bucket is actually a simple linked list. Finding elements in the linked list isn't very fast (O(n)) but that's not a problem if the list is very small.
There is no surefire way to guarantee that your hashcode
function is optimal because it is measured by two different metrics.
Your:
You have to find the balance yourself.
Sometimes it is obvious when there is a very efficient method that never collides (e.g. the ordinal
of an enum
).
Sometimes memoising the values is a good solution - this way even a very inefficient method can be mitigated because it is only ever calculated once. There is an obvious emeory cost to this which also must be balanced.
Sometimes the overall functionality of your code contributes to your choice. Say you want to put File
objects in a HashMap
. A number of options are clear:
Why collisions are bad
One of the main uses of hashcode
is when inserting objects into a HashMap
. The algorithm requests a hash code from the object and uses that to decide which bucket to put the object in. If the hash collides with another object there will be another object in that bucket, in which case the bucket will have to grow which costs time. If all hashes are unique then the map will be one item per bucket and thus maximally efficient.
See the excellent WikiPedia article on Hash Table for a deeper discussion on how HashMap
works.
I prioritised them based on their efficiency
Your list is sorted by ascending efficiency—if by "efficiency" you mean the performance of your application as opposed to the latency of the hashCode
method isolated from everything else. A hashcode with bad dispersion will result in a linear or near-linear search through a linked list inside HashMap
, completely annulling the advantages of a hashtable.
Especially note that, on today's architectures, computation is much cheaper than pointer dereference, and it comes at a fixed low cost. A single cache miss is worth a thousand simple arithmetic operations and each pointer dereference is a potential cache miss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With