Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's behind the hashCode() method for String in Java? [duplicate]

Tags:

java

hashcode

I've been investigating the hashCode() methods in java and found the one for String class strange. The source code is as follows:

public int hashCode() {     int h = hash;     if (h == 0 && value.length > 0) {         char val[] = value;          for (int i = 0; i < value.length; i++) {             h = 31 * h + val[i];         }         hash = h;     }     return h; } 

The code itself is quite straight forward. But I wonder what's the reason for calculating hash code this way?
Why choose 31?
Why start from 0 instead of value.length - 1?
Any guarantee that this would make hashcodes less possible to collide with each other?

like image 820
HarryLv Avatar asked Mar 20 '13 08:03

HarryLv


People also ask

What is the purpose of the hashCode () method?

The purpose of the hashCode() method is to provide a numeric representation of an object's contents so as to provide an alternate mechanism to loosely identify it. By default the hashCode() returns an integer that represents the internal memory address of the object.

Can hashCode of two strings be same in Java?

Two same strings/value must have the same hashcode, but the converse is not true. There might be another string which can match the same hash-code, so we can't derive the key using hash-code. The reason for two different string to have the same hash-code is due to the collision.

Why is hashCode method used in Java?

HashMap and HashSet use hashing to manipulate data. They use hashCode() method to check hash values. The default implementation of hashCode() in Object class returns distinct integers for different objects.

What happens if hashCode () method always return same value?

If multiple objects return the same value from hashCode(), it means that they would be stored in the same bucket. If many objects are stored in the same bucket it means that on average it requires more comparison operations to look up a given object.


1 Answers

Yes the probability of hashcode collision is very low as for example in case of String it depends upon the string value. If we are not creating any String with new operator then if the a new String has the same value that already present , then new String object is not created, it refers to the old value from heap and in this case only the value of hashCode will be same as expected.

The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

From Java 1.2, java.lang.String class implements its hashCode() using a product sum algorithm over the entire text of the string.[2] Given an instance s of the java.lang.String class, for example, would have a hash code h(s) defined by

h(s)=s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] 

where terms are summed using Java 32-bit int addition, s[i] denotes the ith character of the string, and n is the length of s.

For your reference in Apache Harmony the method hashCode is:

public int hashCode() {     if (hashCode == 0) {         int hash = 0, multiplier = 1;         for (int i = offset + count - 1; i >= offset; i--) {             hash += value[i] * multiplier;             int shifted = multiplier << 5;             multiplier = shifted - multiplier;         }         hashCode = hash;     }     return hashCode; } 
like image 111
Shreyos Adikari Avatar answered Oct 06 '22 21:10

Shreyos Adikari