Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is ValueType.GetHashCode() implemented like it is?

Tags:

c#

gethashcode

From ValueType.cs

 **Action: Our algorithm for returning the hashcode is a little bit complex. We look  **        for the first non-static field and get it's hashcode.  If the type has no  **        non-static fields, we return the hashcode of the type. We can't take the **        hashcode of a static member because if that member is of the same type as  **        the original type, we'll end up in an infinite loop. 

I got bitten by this today when I was using a KeyValuePair as a key in a Dictionary (it stored xml attribute name (enum) and it's value (string)), and expected for it to have it's hashcode computed based on all its fields, but according to implementation it only considered the Key part.

Example (c/p from Linqpad):

void Main() {     var kvp1 = new KeyValuePair<string, string>("foo", "bar");     var kvp2 = new KeyValuePair<string, string>("foo", "baz");      // true     (kvp1.GetHashCode() == kvp2.GetHashCode()).Dump(); } 

The first non-static field I guess means the first field in declaratin order, which could also cause trouble when changing variable order in source for whatever reason, and believing it doesn't change the code semantically.

like image 566
alh84001 Avatar asked Oct 01 '10 17:10

alh84001


People also ask

What is the purpose of GetHashCode?

The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. For information about how hash codes are used in hash tables and for some additional hash code algorithms, see the Hash Function entry in Wikipedia. Two objects that are equal return hash codes that are equal.

How does GetHashCode work C#?

The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. Syntax: public virtual int GetHashCode (); Return Value: This method returns a 32-bit signed integer hash code for the current object.


Video Answer


1 Answers

The actual implementation of ValueType.GetHashCode() doesn't quite match the comment. It has two versions of the algorithm, fast and slow. It first checks if the struct contains any members of a reference type and if there is any padding between the fields. Padding is empty space in a structure value, created when the JIT compiler aligns the fields. There's padding in a struct that contains bool and int (3 bytes) but no padding when it contains int and int, they fit snugly together.

Without a reference and without padding, it can do the fast version since every bit in the structure value is a bit that belongs to a field value. It simply xors 4 bytes at a time. You'll get a 'good' hash code that considers all the members. Many simple structure types in the .NET framework behave this way, like Point and Size.

Failing that test, it does the slow version, the moral equivalent of reflection. That's what you get, your KeyValuePair<> contains references. And this one only checks the first candidate field, like the comment says. This is surely a perf optimization, avoiding burning too much time.

Yes, nasty detail and not that widely known. It is usually discovered when somebody notices that their collection code sucks mud.

One more excruciating detail: the fast version has a bug that bytes when the structure contains a field of a type decimal. The values 12m and 12.0m are logically equal but they don't have the same bit pattern. GetHashCode() will say that they are not equal. Ouch.

like image 104
Hans Passant Avatar answered Sep 21 '22 15:09

Hans Passant