Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should the hash code of null always be zero, in .NET

Tags:

c#

.net

null

hash

Given that collections like System.Collections.Generic.HashSet<> accept null as a set member, one can ask what the hash code of null should be. It looks like the framework uses 0:

// nullable struct type int? i = null; i.GetHashCode();  // gives 0 EqualityComparer<int?>.Default.GetHashCode(i);  // gives 0  // class type CultureInfo c = null; EqualityComparer<CultureInfo>.Default.GetHashCode(c);  // gives 0 

This can be (a little) problematic with nullable enums. If we define

enum Season {   Spring,   Summer,   Autumn,   Winter, } 

then the Nullable<Season> (also called Season?) can take just five values, but two of them, namely null and Season.Spring, have the same hash code.

It is tempting to write a "better" equality comparer like this:

class NewNullEnumEqComp<T> : EqualityComparer<T?> where T : struct {   public override bool Equals(T? x, T? y)   {     return Default.Equals(x, y);   }   public override int GetHashCode(T? x)   {     return x.HasValue ? Default.GetHashCode(x) : -1;   } } 

But is there any reason why the hash code of null should be 0?

EDIT/ADDITION:

Some people seem to think this is about overriding Object.GetHashCode(). It really is not, actually. (The authors of .NET did make an override of GetHashCode() in the Nullable<> struct which is relevant, though.) A user-written implementation of the parameterless GetHashCode() can never handle the situation where the object whose hash code we seek is null.

This is about implementing the abstract method EqualityComparer<T>.GetHashCode(T) or otherwise implementing the interface method IEqualityComparer<T>.GetHashCode(T). Now, while creating these links to MSDN, I see that it says there that these methods throw an ArgumentNullException if their sole argument is null. This must certainly be a mistake on MSDN? None of .NET's own implementations throw exceptions. Throwing in that case would effectively break any attempt to add null to a HashSet<>. Unless HashSet<> does something extraordinary when dealing with a null item (I will have to test that).

NEW EDIT/ADDITION:

Now I tried debugging. With HashSet<>, I can confirm that with the default equality comparer, the values Season.Spring and null will end in the same bucket. This can be determined by very carefully inspecting the private array members m_buckets and m_slots. Note that the indices are always, by design, offset by one.

The code I gave above does not, however, fix this. As it turns out, HashSet<> will never even ask the equality comparer when the value is null. This is from the source code of HashSet<>:

    // Workaround Comparers that throw ArgumentNullException for GetHashCode(null).     private int InternalGetHashCode(T item) {         if (item == null) {              return 0;         }          return m_comparer.GetHashCode(item) & Lower31BitMask;      } 

This means that, at least for HashSet<>, it is not even possible to change the hash of null. Instead, a solution is to change the hash of all the other values, like this:

class NewerNullEnumEqComp<T> : EqualityComparer<T?> where T : struct {   public override bool Equals(T? x, T? y)   {     return Default.Equals(x, y);   }   public override int GetHashCode(T? x)   {     return x.HasValue ? 1 + Default.GetHashCode(x) : /* not seen by HashSet: */ 0;   } } 
like image 334
Jeppe Stig Nielsen Avatar asked May 23 '12 15:05

Jeppe Stig Nielsen


People also ask

What is the hash code of null?

hash code of the null key is 0.

What is the value of NULL in C#?

In C#, you can assign the null value to any reference variable. The null value simply means that the variable does not refer to an object in memory. You can use it like this: Circle c = new Circle(42); Circle copy = null; // Initialized ... if (copy == null) { copy = c; // copy and c refer to the same object ... }

Are hash codes unique C#?

NO! A hash code is not an id, and it doesn't return a unique value. This is kind of obvious, when you think about it: GetHashCode returns an Int32 , which has “only” about 4.2 billion possible values, and there's potentially an infinity of different objects, so some of them are bound to have the same hash code.


1 Answers

So long as the hash code returned for nulls is consistent for the type, you should be fine. The only requirement for a hash code is that two objects that are considered equal share the same hash code.

Returning 0 or -1 for null, so long as you choose one and return it all the time, will work. Obviously, non-null hash codes should not return whatever value you use for null.

Similar questions:

GetHashCode on null fields?

What should GetHashCode return when object's identifier is null?

The "Remarks" of this MSDN entry goes into more detail around the hash code. Poignantly, the documentation does not provide any coverage or discussion of null values at all - not even in the community content.

To address your issue with the enum, either re-implement the hash code to return non-zero, add a default "unknown" enum entry equivalent to null, or simply don't use nullable enums.

Interesting find, by the way.

Another problem I see with this generally is that the hash code cannot represent a 4 byte or larger type that is nullable without at least one collision (more as the type size increases). For example, the hash code of an int is just the int, so it uses the full int range. What value in that range do you choose for null? Whatever one you pick will collide with the value's hash code itself.

Collisions in and of themselves are not necessarily a problem, but you need to know they are there. Hash codes are only used in some circumstances. As stated in the docs on MSDN, hash codes are not guaranteed to return different values for different objects so shouldn't be expected to.

like image 130
Adam Houldsworth Avatar answered Sep 18 '22 12:09

Adam Houldsworth