Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String hashCode() documentation vs implementation

Below is the source code snippet of String.hashCode() method from Java 8 (1.8.0_131 to be precise)

/**
 * Returns a hash code for this string. The hash code for a
 * {@code String} object is computed as
 * <blockquote><pre>
 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 * </pre></blockquote>
 * using {@code int} arithmetic, where {@code s[i]} is the
 * <i>i</i>th character of the string, {@code n} is the length of
 * the string, and {@code ^} indicates exponentiation.
 * (The hash value of the empty string is zero.)
 *
 * @return  a hash code value for this object.
 */
public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

You can see that, the documentation says, that hashCode() is computed using below formula

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

while the actual implementation is different

for (int i = 0; i < value.length; i++) {
    h = 31 * h + val[i];
}

Am I missing any obvious thing? Please help me.

like image 668
sanbhat Avatar asked May 03 '18 18:05

sanbhat


2 Answers

The implementation is correct, with the caveat that integer overflow may occur (which is ok here, it doesn't harm anything). It's using Horner's method for polynomial evaluation.

Here's the steps on a sample string "CAT".

h = 0

First loop:

i = 0
h = 31 * 0 + 'C' (67) = 67

Second loop:

i = 1
h = 31 * 67 + 'A' (65) = 2142

Third loop:

i = 2
h = 31 * 2142 + 'T' (84) = 66486

Let's derive the formula from the code. Here, n is the index of i into the string s. Each iteration of the for loop performs this formula.

hn = 31hn-1 + sn

h0 /* after loop i = 0 */ = s[0]
h1 /* after loop i = 1 */ = 31*h0 + s[1] = 31*s[0] + s[1]
h2 /* after loop i = 2 */ = 31*h1 + s[2] = 31*(31*s[0] + s[1]) + s[2]
h = 31*31*s[0] + 31*s[1] + s[2]

The exponents you see for the powers of 31 arise because each loop multiplies in another factor of 31 before adding the value of the next character.

like image 95
rgettman Avatar answered Nov 15 '22 09:11

rgettman


It is easiest to see what happens with some example. Let's take a String s of length n and all notation as above. We will analyze the loop iteration for iteration. We will call h_old the value h has at the beginning of the current iteration and h_new the value h has at the end of the current iteration. It is easy to see that h_new of iteration i will be h_old of iteration i + 1.

╔═════╦════════════════════════════╦═════════════════════════════════════════════════╗
║ It. ║ h_old                      ║ h_new                                           ║
╠═════╬════════════════════════════╬═════════════════════════════════════════════════╣
║ 1   ║ 0                          ║ 31*h_old + s[0] =                               ║
║     ║                            ║          s[0]                                   ║
║     ║                            ║                                                 ║
║ 2   ║ s[0]                       ║ 31*h_old + s[1] =                               ║
║     ║                            ║ 31      *s[0] +          s[1]                   ║
║     ║                            ║                                                 ║
║ 3   ║ 31  *s[0] +    s[1]        ║ 31^2    *s[0] + 31      *s[1] +    s[2]         ║
║     ║                            ║                                                 ║
║ 4   ║ 31^2*s[0] + 31*s[1] + s[2] ║ 31^3    *s[0] + 31^2    *s[1] + 31*s[2] + s[3]  ║
║ :   ║ :                          ║ :                                               ║
║ n   ║ ...                        ║ 31^(n-1)*s[0] + 31^(n-2)*s[1] + ... + 31^0*s[n] ║
╚═════╩════════════════════════════╩═════════════════════════════════════════════════╝

(Table generated with Senseful)

The powers of 31 are created through the loop and the constant multiplication of h with 31 (making use of the distributivity of the multiplication). As we can see in the last row of the table, this is exactly what the documentation said it would be.

like image 33
Turing85 Avatar answered Nov 15 '22 07:11

Turing85