Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the hash code of a string by hand?

Tags:

java

hash

I was wondering how to calculate the hash code for a given string by hand. I understand that in Java, you can do something like:

String me = "What you say what you say what?";  
long whatever = me.hashCode();

That's all good and dandy, but I was wondering how to do it by hand. I know the given formula for calculating the hash code of a string is something like:

S0 X 31 ^ (n-1) + S1 X 31 ^ (n-2) + .... + S(n-2) X 31 + S(n-1)  

Where S indicates the character in the string, and n is the length of the string. Using 16 bit unicode then, the first character from string me would be computed as:

87 X (31 ^ 34)

However, that creates an insanely large number. I can't imagine adding all the characters together like that. So, in order to calculate the lowest-order 32 bits result then, what would I do? Long whatever from above equals -957986661 and I'm not how to calculate that?

like image 376
thomascirca Avatar asked Sep 25 '10 20:09

thomascirca


3 Answers

Take a look at the source code of java.lang.String.

/**
 * Returns a hash code for this string. The hash code for a
 * <code>String</code> object is computed as
 * <blockquote><pre>
 * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
 * </pre></blockquote>
 * using <code>int</code> arithmetic, where <code>s[i]</code> is the
 * <i>i</i>th character of the string, <code>n</code> is the length of
 * the string, and <code>^</code> indicates exponentiation.
 * (The hash value of the empty string is zero.)
 *
 * @return  a hash code value for this object.
 */
public int hashCode() {
    int h = hash;
    int len = count;
    if (h == 0 && len > 0) {
        int off = offset;
        char val[] = value;
        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}
like image 122
dty Avatar answered Oct 20 '22 23:10

dty


Most hash functions of this sort calculate the hash value modulo some large number (e.g. a large prime). This avoids overflows and keeps the range of values returned by the function within a specified range. But this also means an infinite range of input values will get a hash value from a finite set of possible values (i.e. [0,modulus)), hence the problem of hash collisions.

In this case, the code would look something like this:

   public int hash(String x){
        int hashcode=0;
        int MOD=10007;
        int shift=29;
        for(int i=0;i<x.length();i++){
            hashcode=((shift*hashcode)%MOD+x.charAt(i))%MOD;
        }
        return hashcode; 
    }

Exercise for the reader:

See the code for the hashCode function for java.util.String. Can you see why it does not use a modulus explicitly?

like image 34
MAK Avatar answered Oct 20 '22 23:10

MAK


The following statements will find the string hashCode

String str="Hi";

int a = str.hashCode();//returns 2337

Let's check how exactly its calculated

HashCode = s[0]*31(n-1) + s[1]*31(n-2) + .. s(n-2)

As we all know that the character at position 0 is H, Character at position 1 is i, and the string length is 2.

==> H*31(2-1) + i*31(2-2)

As we all know that, ASCII code of H is 72, and i is 105. It means,

==> 72 * 31 + 105 * 1 (Anything Power 0 is 1)

==> 2232 + 105 = 2337

Source: https://www.tutorialgateway.org/find-string-hashcode-in-java/

like image 26
Vivek Singh Avatar answered Oct 20 '22 22:10

Vivek Singh