Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make an efficient hashCode?

I have three hashCode methods as follows, I prioritised them based on their efficiency. I am wondering if there is any other way to make a more efficient hashCode method.

1) public int hashCode() { //terrible
     return 5; 
   }
2) public int hashCode() { //a bit less terrible
     return name.length; 
   }
3) public int hashCode() { //better
     final int prime = 31;
     int result = 1;
     result = prime * result + ((name == null) ? 0 : name.hashCode());
     return result;
   }
like image 629
Jack Avatar asked Aug 28 '15 10:08

Jack


People also ask

How can HashMap be made more efficient?

Use wrappers for composite HashMap keys Whenever a HashMap has composite String keys, use a wrapper instead of concatenating the strings to make a key. Doing so will make the lookup much faster and reduce allocation rate, as the benchmark below demonstrates.

Why is 31 used in Hashcode?

The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional.

What can cause a HashMap to be less efficient?

HashMap's Bottleneck Because non-equal objects can have the same hash codes (a phenomenon called hash code collision), buckets can grow in size. The bucket is actually a simple linked list. Finding elements in the linked list isn't very fast (O(n)) but that's not a problem if the list is very small.


2 Answers

There is no surefire way to guarantee that your hashcode function is optimal because it is measured by two different metrics.

  • Efficiency - How quick it is to calculate.
  • Collisions - What is the chance of collision.

Your:

  1. Maximises efficiency at the expense of collisions.
  2. Finds a spot somwhere in the middle - but still not good.
  3. Least efficient but best for avoiding collisions - still not necessarily best.

You have to find the balance yourself.

Sometimes it is obvious when there is a very efficient method that never collides (e.g. the ordinal of an enum).

Sometimes memoising the values is a good solution - this way even a very inefficient method can be mitigated because it is only ever calculated once. There is an obvious emeory cost to this which also must be balanced.

Sometimes the overall functionality of your code contributes to your choice. Say you want to put File objects in a HashMap. A number of options are clear:

  1. Use the hashcode of the file name.
  2. Use the hashcode of the file path.
  3. Use a crc of the contents of the file.
  4. Use the hashcode of the SHA1 digest of the contents of the file.

Why collisions are bad

One of the main uses of hashcode is when inserting objects into a HashMap. The algorithm requests a hash code from the object and uses that to decide which bucket to put the object in. If the hash collides with another object there will be another object in that bucket, in which case the bucket will have to grow which costs time. If all hashes are unique then the map will be one item per bucket and thus maximally efficient.

See the excellent WikiPedia article on Hash Table for a deeper discussion on how HashMap works.

like image 195
OldCurmudgeon Avatar answered Sep 29 '22 07:09

OldCurmudgeon


I prioritised them based on their efficiency

Your list is sorted by ascending efficiency—if by "efficiency" you mean the performance of your application as opposed to the latency of the hashCode method isolated from everything else. A hashcode with bad dispersion will result in a linear or near-linear search through a linked list inside HashMap, completely annulling the advantages of a hashtable.

Especially note that, on today's architectures, computation is much cheaper than pointer dereference, and it comes at a fixed low cost. A single cache miss is worth a thousand simple arithmetic operations and each pointer dereference is a potential cache miss.

like image 27
Marko Topolnik Avatar answered Sep 29 '22 06:09

Marko Topolnik