Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

General advice and guidelines on how to properly override object.GetHashCode()

People also ask

When should we override the GetHashCode () method?

It's my understanding that the original GetHashCode() returns the memory address of the object, so it's essential to override it if you wish to compare two different objects. EDITED: That was incorrect, the original GetHashCode() method cannot assure the equality of 2 values.

How is GetHashCode implemented?

For example, the implementation of the GetHashCode() method provided by the String class returns identical hash codes for identical string values. Therefore, two String objects return the same hash code if they represent the same string value.

How does GetHashCode work C#?

The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. Syntax: public virtual int GetHashCode (); Return Value: This method returns a 32-bit signed integer hash code for the current object.

Why do we need GetHashCode C#?

GetHashCode returns a value based on the current instance that is suited for hashing algorithms and data structures such as a hash table. Two objects that are the same type and are equal must return the same hash code to ensure that instances of System.


Table of contents

  • When do I override object.GetHashCode?

  • Why do I have to override object.GetHashCode()?

  • What are those magic numbers seen in GetHashCode implementations?


Things that I would like to be covered, but haven't been yet:

  • How to create the integer (How to "convert" an object into an int wasn't very obvious to me anyways).
  • What fields to base the hash code upon.
    • If it should only be on immutable fields, what if there are only mutable ones?
  • How to generate a good random distribution. (MSDN Property #3)
    • Part to this, seems to choose a good magic prime number (have seen 17, 23 and 397 been used), but how do you choose it, and what is it for exactly?
  • How to make sure the hash code stays the same all through the object lifetime. (MSDN Property #2)
    • Especially when the equality is based upon mutable fields. (MSDN Property #1)
  • How to deal with fields that are complex types (not among the built-in C# types).
    • Complex objects and structs, arrays, collections, lists, dictionaries, generic types, etc.
    • For example, even though the list or dictionary might be readonly, that doesn't mean the contents of it are.
  • How to deal with inherited classes.
    • Should you somehow incorporate base.GetHashCode() into your hash code?
  • Could you technically just be lazy and return 0? Would heavily break MSDN guideline number #3, but would at least make sure #1 and #2 were always true :P
  • Common pitfalls and gotchas.

What are those magic numbers often seen in GetHashCode implementations?

They are prime numbers. Prime numbers are used for creating hash codes because prime number maximize the usage of the hash code space.

Specifically, start with the small prime number 3, and consider only the low-order nybbles of the results:

  • 3 * 1 = 3 = 3(mod 8) = 0011
  • 3 * 2 = 6 = 6(mod 8) = 1010
  • 3 * 3 = 9 = 1(mod 8) = 0001
  • 3 * 4 = 12 = 4(mod 8) = 1000
  • 3 * 5 = 15 = 7(mod 8) = 1111
  • 3 * 6 = 18 = 2(mod 8) = 0010
  • 3 * 7 = 21 = 5(mod 8) = 1001
  • 3 * 8 = 24 = 0(mod 8) = 0000
  • 3 * 9 = 27 = 3(mod 8) = 0011

And we start over. But you'll notice that successive multiples of our prime generated every possible permutation of bits in our nybble before starting to repeat. We can get the same effect with any prime number and any number of bits, which makes prime numbers optimal for generating near-random hash codes. The reason we usually see larger primes instead of small primes like 3 in the example above is that, for greater numbers of bits in our hash code, the results obtained from using a small prime are not even pseudo-random - they're simply an increasing sequence until an overflow is encountered. For optimal randomness, a prime number that results in overflow for fairly small coefficients should be used, unless you can guarantee that your coefficients will not be small.

Related links:

  • Why is ‘397’ used for ReSharper GetHashCode override?

Why do I have to override object.GetHashCode()?

Overriding this method is important because the following property must always remain true:

If two objects compare as equal, the GetHashCode method for each object must return the same value.

The reason, as stated by JaredPar in a blog post on implementing equality, is that

Many classes use the hash code to classify an object. In particular hash tables and dictionaries tend to place objects in buckets based on their hash code. When checking if an object is already in the hash table it will first look for it in a bucket. If two objects are equal but have different hash codes they may be put into different buckets and the dictionary would fail to lookup the object.

Related links:

  • Do I HAVE to override GetHashCode and Equals in new Classes?
  • Do I need to override GetHashCode() on reference types?
  • Override Equals and GetHashCode Question
  • Why is it important to override GetHashCode when Equals method is overriden in C#?
  • Properly Implementing Equality in VB

Check out Guidelines and rules for GetHashCode by Eric Lippert