Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does initial hash value in GetHashCode() implementation generated for anonymous class depend on property names?

Tags:

c#

roslyn

When generating GetHashCode() implementation for anonymous class, Roslyn computes the initial hash value based on the property names. For example, the class generated for

var x = new { Int = 42, Text = "42" };

is going to have the following GetHashCode() method:

public override in GetHashCode()
{
   int hash = 339055328;
   hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int );
   hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text );
   return hash;
}

But if we change the property names, the initial value changes:

var x = new { Int2 = 42, Text2 = "42" };

public override in GetHashCode()
{
   int hash = 605502342;
   hash = hash * -1521134295 + EqualityComparer<int>.Default.GetHashCode( Int2 );
   hash = hash * -1521134295 + EqualityComparer<string>.Default.GetHashCode( Text2 );
   return hash;
}

What's the reason behind this behaviour? Is there some problem with just picking a big [prime?] number and using it for all the anonymous classes?

like image 736
HellBrick Avatar asked Sep 27 '15 13:09

HellBrick


1 Answers

Is there some problem with just picking a big [prime?] number and using it for all the anonymous classes?

There is nothing wrong with doing this, it just tends to produce a less efficient value.

The goal of a GetHashCode implementation is to return different results for values which are not equal. This decreases the chance of collisions when the values are used in hash based collections (such as Dictionary<TKey, TValue>).

An anonymous value can never be equal to another anonymous value if they represent different types. The type of an anonymous value is defined by the shape of the properties:

  • Name of properties
  • Type of properties
  • Count of properties

Two anonymous values which differ on any of these characteristics represent different types and hence can never be equal values.

Given this is true it makes sense for the compiler to generate GetHashCode implementations which tend to return different values for different types. This is why the compiler includes the property names when computing the initial hash.

like image 59
JaredPar Avatar answered Nov 14 '22 23:11

JaredPar