Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will this hash function collide unusually frequently?

I had the following code to generate a hash of an object:

public int GetHashCode(MyType obj)
{
   return (obj.Prop1.GetHashCode() + obj.Prop2.GetHashCode() + obj.Prop3.GetHashCode()).GetHashCode();
}

I.e. I add all the properties' hash codes and then take the hash of this.

In review, a coworker suggested that this will collide too frequently. I'm not sure that this is true because:

  1. Given that hash codes are chosen with equal frequency among positive and negative numbers and they wrap around, I don't think there's any additional information we gain about the likelihood of these numbers' sum as opposed to the numbers themselves
  2. To the extent that their sum is non-random, hash codes are designed to make numbers that are "close together" become "far apart", so feeding a non-uniformly-distributed value into the function shouldn't be an issue

Who is correct?

It is in C#, in case the answer is language-specific.

like image 800
Xodarap Avatar asked Jun 08 '11 21:06

Xodarap


2 Answers

Yes.

Just suppose Prop1, Prop2 etc are of type int. Usually only the lower range of integers is used. Your sum approach will collide more often than necessary.

The HasCode of 7 is 7, which makes perfect sense when hashing int by it self. But with your code the tuples <7, 3>, <3, 7> and <8, 2> will all have the same Hash. The same with simple XOR instead of Addition.

The common approach is to add some (prime) numbers and shifting:

public int GetHashCode(MyType obj)
{
  int hash = 0;
  unchecked
  {         
     hash += 19 * obj.Prop1.GetHashCode();
     hash += 31 * obj.Prop2.GetHashCode();
     hash += 37 * obj.Prop3.GetHashCode();
  }
  return hash;
}

The numbers 19, 31, 37 are not too critical. And if you prefer you can use OR or XOR instead of + .

like image 73
Henk Holterman Avatar answered Nov 15 '22 11:11

Henk Holterman


XORing would be better:

public int GetHashCode(MyType obj)
{
   return obj.Prop1.GetHashCode() ^ 
          obj.Prop2.GetHashCode() ^ 
          obj.Prop3.GetHashCode();
}
like image 36
Darin Dimitrov Avatar answered Nov 15 '22 11:11

Darin Dimitrov