Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing GetHashCode in C#. Null-value handling

Tags:

c#

.net

hash

Before I begin, all code samples here I tested on Mono environment and there is one noticeable difference in the GetHashCode implementations:

string.Empty.GetHashCode(); // returns 0 in Mono 3.10
string.Empty.GetHashCode(); // returns 757602046 in .NET 4.5.1

I made my implementation based on this SO Answer by @JonSkeet and in the comments he also suggests to use 0 hash code for NULL values (wasn't sure how should I hash them).

I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field.

So having following implementation (Mono 3.10):

public class Entity {
    public int EntityID { get; set; }
    public string EntityName { get; set; }

    public override int GetHashCode() {
        unchecked {
            int hash = 15485863;       // prime number
            int multiplier = 1299709;  // another prime number

            hash = hash * multiplier + EntityID.GetHashCode();
            hash = hash * multiplier + (EntityName != null ? EntityName.GetHashCode() : 0);

            return hash;
        }
    }
}

It is quite easy to find collision e.g.

var hash1 = new Entity { EntityID = 1337, EntityName = "" }.GetHashCode();
var hash2 = new Entity { EntityID = 1337, EntityName = null }.GetHashCode();

bool equals = hash1 == hash2; // true

I could replace null-value 0 with some other number, however it won't fix the problem as there still is a chance that some hash(string) output will generate such number and I'll get another collision.

My question: How should I handle null values while using algorithm from example above?

like image 774
Andriy Horen Avatar asked Jul 03 '15 12:07

Andriy Horen


People also ask

Do I need to implement GetHashCode?

Why is it important to override GetHashCode ? It s important to implement both equals and gethashcode, due to collisions, in particular while using dictionaries. if two object returns same hashcode, they are inserted in the dictionary with chaining. While accessing the item equals method is used.

What is the default implementation of GetHashCode C#?

The default implementation of GetHashCode() for reference types returns a hash code that is equivalent to the one returned by the GetHashCode(Object) method. You can override GetHashCode() for immutable reference types.

What is GetHashCode C#?

A hash code is a numeric value which is used to insert and identify an object in a hash-based collection. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. Syntax: public virtual int GetHashCode ();

Should I override GetHashCode?

If you're implementing a reference type, you should consider overriding the Equals method if your type looks like a base type, such as Point, String, BigNumber, and so on. Override the GetHashCode method to allow a type to work correctly in a hash table.


2 Answers

Your "problem" here is that you are trying the get collision free hash codes. While this is perfect for the lookup performance of collection implementations that use the hash code for lookup (e.g. HashSet and Dictionary) in the most cases this will not work.

The reason for that is that the hash code is just a 32-bit integer value and it represents data that is usually a lot bigger (multiple integer values, strings, etc.).

So the hash code is only there to define that two objects could be equal. The collection classes use the hash code to refine the area where the object is stored and use the equals function to find if two objects are really the same. For that reason you should always implement the Equals function for classes you implemented the hash code for. While those classes will fall back to the equals function of object, is it also a good idea to implement the IEquatable<T> interface to avoid typing problems of any kind (still overwrite the default equals method of Object!)

like image 123
Nitram Avatar answered Sep 28 '22 16:09

Nitram


My question: How should I handle null values while using algorithm from example above?

I don't think the problem is with null per-se. The problem lays in the fact you're using GetHashCode for equality, which it isn't meant for. GetHashCode should provide such hashes that aspire to normal distribution.

The docs say:

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes.

And then goes on to specify the purpose of GetHashCode:

A hash code is intended for efficient insertion and lookup in collections that are based on a hash table.

You should be implementing IEquatable<Entity>, where you actually define the equivalence relation of two entities. And override != and == while you're at it.

An approximation:

public class Entity : IEquatable<Entity>
{
    public int EntityId { get; set; }
    public string EntityName { get; set; }

    public bool Equals(Entity other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return EntityId == other.EntityId && 
               string.Equals(EntityName, other.EntityName);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Entity) obj);
    }

    public static bool operator ==(Entity left, Entity right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Entity left, Entity right)
    {
        return !Equals(left, right);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return (EntityId*397) ^ (EntityName != null ? EntityName.GetHashCode() : 0);
        }
    }
}
like image 24
Yuval Itzchakov Avatar answered Sep 28 '22 16:09

Yuval Itzchakov