Below is the source code snippet of String.hashCode()
method from Java 8 (1.8.0_131 to be precise)
/**
* Returns a hash code for this string. The hash code for a
* {@code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {@code int} arithmetic, where {@code s[i]} is the
* <i>i</i>th character of the string, {@code n} is the length of
* the string, and {@code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
You can see that, the documentation says, that hashCode()
is computed using below formula
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
while the actual implementation is different
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
Am I missing any obvious thing? Please help me.
The implementation is correct, with the caveat that integer overflow may occur (which is ok here, it doesn't harm anything). It's using Horner's method for polynomial evaluation.
Here's the steps on a sample string "CAT".
h = 0
First loop:
i = 0
h = 31 * 0 + 'C' (67) = 67
Second loop:
i = 1
h = 31 * 67 + 'A' (65) = 2142
Third loop:
i = 2
h = 31 * 2142 + 'T' (84) = 66486
Let's derive the formula from the code. Here, n is the index of i
into the string s. Each iteration of the for
loop performs this formula.
hn = 31hn-1 + sn
h0 /* after loop i = 0 */ = s[0]
h1 /* after loop i = 1 */ = 31*h0 + s[1] = 31*s[0] + s[1]
h2 /* after loop i = 2 */ = 31*h1 + s[2] = 31*(31*s[0] + s[1]) + s[2]
h = 31*31*s[0] + 31*s[1] + s[2]
The exponents you see for the powers of 31 arise because each loop multiplies in another factor of 31
before adding the value of the next character.
It is easiest to see what happens with some example. Let's take a String s
of length n
and all notation as above. We will analyze the loop iteration for iteration. We will call h_old
the value h
has at the beginning of the current iteration and h_new
the value h
has at the end of the current iteration. It is easy to see that h_new
of iteration i
will be h_old
of iteration i + 1
.
╔═════╦════════════════════════════╦═════════════════════════════════════════════════╗
║ It. ║ h_old ║ h_new ║
╠═════╬════════════════════════════╬═════════════════════════════════════════════════╣
║ 1 ║ 0 ║ 31*h_old + s[0] = ║
║ ║ ║ s[0] ║
║ ║ ║ ║
║ 2 ║ s[0] ║ 31*h_old + s[1] = ║
║ ║ ║ 31 *s[0] + s[1] ║
║ ║ ║ ║
║ 3 ║ 31 *s[0] + s[1] ║ 31^2 *s[0] + 31 *s[1] + s[2] ║
║ ║ ║ ║
║ 4 ║ 31^2*s[0] + 31*s[1] + s[2] ║ 31^3 *s[0] + 31^2 *s[1] + 31*s[2] + s[3] ║
║ : ║ : ║ : ║
║ n ║ ... ║ 31^(n-1)*s[0] + 31^(n-2)*s[1] + ... + 31^0*s[n] ║
╚═════╩════════════════════════════╩═════════════════════════════════════════════════╝
(Table generated with Senseful)
The powers of 31
are created through the loop and the constant multiplication of h
with 31
(making use of the distributivity of the multiplication).
As we can see in the last row of the table, this is exactly what the documentation said it would be.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With