Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Nullable<T>.GetHashCode() a poor hash code function?

Tags:

c#

.net

hashcode

The implementation of Nullable<T>.GetHashCode() is as follows:

public override int GetHashCode()
{
    if (!this.HasValue)
    {
        return 0;
    }
    return this.value.GetHashCode();
}

If however the underlying value also generates a hash code of 0 (e.g. a bool set to false or an int32 set to 0), then we have two commonly occurring different object states with the same hash code. It seems to me that a better implementation would have been something like.

public override int GetHashCode()
{
    if (!this.HasValue)
    {
        return 0xD523648A; // E.g. some arbitrary 32 bit int with a good mix of set and 
                           // unset bits (also probably a prime number).
    }
    return this.value.GetHashCode();
}
like image 537
redcalx Avatar asked Nov 23 '12 11:11

redcalx


3 Answers

Yes, you do have a point. It is always possible to write a better GetHashCode() implementation if you know up front what data you are going to store. Not a luxury that a library writer ever has available. But yes, if you have a lot of bool? that are either false or !HasValue then the default implementation is going to hurt. Same for enums and ints, zero is a common value.

Your argument is academic however, changing the implementation costs minus ten thousand points and you can't do it yourself. Best you can do is submit the suggestion, the proper channel is the user-voice site. Getting traction on this is going to be difficult, good luck.

like image 72
Hans Passant Avatar answered Nov 14 '22 09:11

Hans Passant


Let's first note that this question is just about performance. The hash code is not required to be unique or collision resistant for correctness. It is helpful for performance though.

Actually, this is the main value proposition of a hash table: Practically evenly distributed hash codes lead to O(1) behavior.

So what hash code constant is most likely to lead to the best possible performance profile in real applications?

Certainly not 0 because 0 is a common hash code: 0.GetHashCode() == 0. That goes for other types as well. 0 is the worst candidate because it tends to occur so often.

So how to avoid collisions? My proposal:

static readonly int nullableDefaultHashCode = GetRandomInt32();
public override int GetHashCode()
{
    if (!this.HasValue)
        return nullableDefaultHashCode;
    else
        return this.value.GetHashCode();
}

Evenly distributed, unlikely to collide and no stylistic problem of choosing an arbitrary constant.

Note, that GetRandomInt32 could be implemented as return 0xD523648A;. It would still be more useful than return 0;. But it is probably best to query a cheap source of pseudo-random numbers.

like image 23
usr Avatar answered Nov 14 '22 09:11

usr


In the end, a Nullable<T> without value has to return a hashcode, and that hashcode should be a constant.

Returning an arbitrary constant may look more safe or appropriate, perhaps even more so when viewed within the specific case of Nullable<int>, but in the end it's just that: a hash.

And within the entire set that Nullable<T> can cover (which is infinite), zero is not a better hashcode than any other value.

like image 33
Willem van Rumpt Avatar answered Nov 14 '22 08:11

Willem van Rumpt