Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash code non-zero initial value - note: I am not asking about primes

This is kind of an academic point, but I feel I don't fully understand hash codes if I don't understand why this is recommended by books such as Effective Java and many SO questions.

Suppose:

public sealed class Point
{
    private readonly int x;
    private readonly int y;

    //constructor ommited

    //equals ommited
    
    public override int GetHashcode()
    {
       int hash = 17; //why should the initial value be non-zero?
       unchecked
       {
         hash = hash * 31 + x; //do not tell me why I should use primes - that is not the question
         hash = hash * 31 + y;
         return hash;
       }
    }
}

Now, supposedly, the reason for the initial value is that it reduces collisions where one of the components is zero.

I'm struggling to find any example where this helps.

Here is one example of a collision, but having an initial value makes no odds.

x   y   Hash Without initial value     Hash With initial value  
0   31  31                             16368                
1   0   31                             16368                

Ideally, I'm looking for a concrete example where the initial value prevents a collision.

My theory on why an initial value can never make a difference

//Given a prime p, initial value i, fields a,b,c, calculate hash h
h = i;
h = h*p + a;
h = h*p + b;
h = h*p + c;

Therefore:

h = ((i*p + a)*p + b)*p + c
  = (ipp + ap + b   )*p + c
  = ippp + app + bp + c

Therefore the inital value i will effect all hash codes in the same way by producing a constant value, in this case i*p3.

like image 451
weston Avatar asked Nov 19 '12 15:11

weston


2 Answers

The initial value must be a prime number. Why? Because say you are hashing to get an index for an array of length = 20: [object.getHash()%20] is the index of the array where you will want to store your object. If you had used an even number: half of the addresses of your data structure would never be used...this is why you need to use an initial value: to minimize collisions...and maximize data structure usage

like image 144
doctor killer Avatar answered Sep 29 '22 09:09

doctor killer


Using prime numbers have shown via experiment and testing that have good properties for hash functions.
Also hard-coded numbers you see in existing libraries e.g. 31 in Java have been found during testing that they are good options. As far as I know there is no some proof behind the choices of these "magic" numbers.They were selected only after field testing

Update:
If you use zero as initial value then your hash will be affected by member variables also zero.
E.g. hash = hash * 31 + x; will be 0 if x is 0 and your initial value is also 0.
Then you end up with y which could be 0 as well or a number that could happen to be very common in your application domain and end up with collisions

like image 28
Cratylus Avatar answered Sep 29 '22 08:09

Cratylus