I have three hashCode methods as follows, I prioritised them based on their efficiency. I am wondering if there is any other way to make a more efficient hashCode method. <pre class="prettyprint"><code>1) public int hashCode() { //terrible return 5; } 2) public int hashCode() { //a bit less terrible return name.length; } 3) public int hashCode() { //better final int prime = 31; int result = 1; result = prime * result + ((name == null) ? 0 : name.hashCode()); return result; } </code></pre>

<blockquote> I prioritised them based on their efficiency </blockquote> Your list is sorted by ascending efficiency—if by "efficiency" you mean the performance of your application as opposed to the latency of the <code>hashCode</code> method isolated from everything else. A hashcode with bad dispersion will result in a linear or near-linear search through a linked list inside <code>HashMap</code>, completely annulling the advantages of a hashtable. Especially note that, on today's architectures, computation is much cheaper than pointer dereference, and it comes at a fixed low cost. A single cache miss is worth a thousand simple arithmetic operations and each pointer dereference is a potential cache miss.

How to make an efficient hashCode?

Tags:

java

performance

hashcode

I have three hashCode methods as follows, I prioritised them based on their efficiency. I am wondering if there is any other way to make a more efficient hashCode method.

1) public int hashCode() { //terrible
     return 5; 
   }
2) public int hashCode() { //a bit less terrible
     return name.length; 
   }
3) public int hashCode() { //better
     final int prime = 31;
     int result = 1;
     result = prime * result + ((name == null) ? 0 : name.hashCode());
     return result;
   }

629

asked Aug 28 '15 10:08

Jack

2 Answers

There is no surefire way to guarantee that your hashcode function is optimal because it is measured by two different metrics.

Efficiency - How quick it is to calculate.
Collisions - What is the chance of collision.

Your:

Maximises efficiency at the expense of collisions.
Finds a spot somwhere in the middle - but still not good.
Least efficient but best for avoiding collisions - still not necessarily best.

You have to find the balance yourself.

Sometimes it is obvious when there is a very efficient method that never collides (e.g. the ordinal of an enum).

Sometimes memoising the values is a good solution - this way even a very inefficient method can be mitigated because it is only ever calculated once. There is an obvious emeory cost to this which also must be balanced.

Sometimes the overall functionality of your code contributes to your choice. Say you want to put File objects in a HashMap. A number of options are clear:

Use the hashcode of the file name.
Use the hashcode of the file path.
Use a crc of the contents of the file.
Use the hashcode of the SHA1 digest of the contents of the file.

Why collisions are bad

One of the main uses of hashcode is when inserting objects into a HashMap. The algorithm requests a hash code from the object and uses that to decide which bucket to put the object in. If the hash collides with another object there will be another object in that bucket, in which case the bucket will have to grow which costs time. If all hashes are unique then the map will be one item per bucket and thus maximally efficient.

See the excellent WikiPedia article on Hash Table for a deeper discussion on how HashMap works.

195

answered Sep 29 '22 07:09

OldCurmudgeon

I prioritised them based on their efficiency

Your list is sorted by ascending efficiency—if by "efficiency" you mean the performance of your application as opposed to the latency of the hashCode method isolated from everything else. A hashcode with bad dispersion will result in a linear or near-linear search through a linked list inside HashMap, completely annulling the advantages of a hashtable.

Especially note that, on today's architectures, computation is much cheaper than pointer dereference, and it comes at a fixed low cost. A single cache miss is worth a thousand simple arithmetic operations and each pointer dereference is a potential cache miss.

answered Sep 29 '22 06:09

Marko Topolnik

Related questions
                            
                                Replace all occurrences of group
                            
                                Shortening if() with string.equals method
                            
                                How Deadlock happens in the below code?
                            
                                Why is this method returning an ArrayList with all the same objects? JAVA
                            
                                Retrolambda on Travis CI
                            
                                Buffered Reader read text until character
                            
                                Spring data jpa detached entity
                            
                                Understanding inner generic classes
                            
                                Java TimeZone.getTimeZone("PDT") not working
                            
                                How to set Blank excel cell using POI
                            
                                What is the java 8 equivalent to Guava's transformAndConcat?
                            
                                Java/Android - setting float values - 0f vs 0.0f
                            
                                What's the purpose of the IntStream.empty() method?
                            
                                How to use Java Stream map for mapping between different types?
                            
                                Android Studio Gradle cannot find 'com.android.support:design:22.2.0' (the Android Design Support Library)
                            
                                Why does "big = big.add(..)" need to be used to sum BigIntegers?
                            
                                Multiple java versions installed and java was started but returned exit code=13
                            
                                Getting rid of if/else while calling similar classes Java
                            
                                org.openqa.selenium.SessionNotCreatedException: A new session could not be created. (Original error: Requested a new session but one was in progress)
                            
                                java SimpleDateFormat - Correct format for Postgres "timestamp with timezone" date format

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With