This is kind of an academic point, but I feel I don't fully understand hash codes if I don't understand why this is recommended by books such as Effective Java and many SO questions.
Suppose:
public sealed class Point
{
private readonly int x;
private readonly int y;
//constructor ommited
//equals ommited
public override int GetHashcode()
{
int hash = 17; //why should the initial value be non-zero?
unchecked
{
hash = hash * 31 + x; //do not tell me why I should use primes - that is not the question
hash = hash * 31 + y;
return hash;
}
}
}
Now, supposedly, the reason for the initial value is that it reduces collisions where one of the components is zero.
I'm struggling to find any example where this helps.
Here is one example of a collision, but having an initial value makes no odds.
x y Hash Without initial value Hash With initial value
0 31 31 16368
1 0 31 16368
Ideally, I'm looking for a concrete example where the initial value prevents a collision.
My theory on why an initial value can never make a difference
//Given a prime p, initial value i, fields a,b,c, calculate hash h
h = i;
h = h*p + a;
h = h*p + b;
h = h*p + c;
Therefore:
h = ((i*p + a)*p + b)*p + c
= (ipp + ap + b )*p + c
= ippp + app + bp + c
Therefore the inital value i
will effect all hash codes in the same way by producing a constant value, in this case i*p
3.
The initial value must be a prime number. Why? Because say you are hashing to get an index for an array of length = 20: [object.getHash()%20] is the index of the array where you will want to store your object. If you had used an even number: half of the addresses of your data structure would never be used...this is why you need to use an initial value: to minimize collisions...and maximize data structure usage
Using prime numbers have shown via experiment and testing that have good properties for hash functions.
Also hard-coded numbers you see in existing libraries e.g. 31
in Java
have been found during testing that they are good options. As far as I know there is no some proof behind the choices of these "magic" numbers.They were selected only after field testing
Update:
If you use zero as initial value then your hash will be affected by member variables also zero.
E.g. hash = hash * 31 + x;
will be 0
if x
is 0
and your initial value is also 0
.
Then you end up with y
which could be 0
as well or a number that could happen to be very common in your application domain and end up with collisions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With