Before I begin, all code samples here I tested on Mono environment and there is one noticeable difference in the GetHashCode
implementations:
string.Empty.GetHashCode(); // returns 0 in Mono 3.10
string.Empty.GetHashCode(); // returns 757602046 in .NET 4.5.1
I made my implementation based on this SO Answer by @JonSkeet and in the comments he also suggests to use 0 hash code for NULL values (wasn't sure how should I hash them).
I usually use 0 as the effective hash code for null - which isn't the same as ignoring the field.
So having following implementation (Mono 3.10):
public class Entity {
public int EntityID { get; set; }
public string EntityName { get; set; }
public override int GetHashCode() {
unchecked {
int hash = 15485863; // prime number
int multiplier = 1299709; // another prime number
hash = hash * multiplier + EntityID.GetHashCode();
hash = hash * multiplier + (EntityName != null ? EntityName.GetHashCode() : 0);
return hash;
}
}
}
It is quite easy to find collision e.g.
var hash1 = new Entity { EntityID = 1337, EntityName = "" }.GetHashCode();
var hash2 = new Entity { EntityID = 1337, EntityName = null }.GetHashCode();
bool equals = hash1 == hash2; // true
I could replace null-value 0 with some other number, however it won't fix the problem as there still is a chance that some hash(string) output will generate such number and I'll get another collision.
My question: How should I handle null values while using algorithm from example above?
Why is it important to override GetHashCode ? It s important to implement both equals and gethashcode, due to collisions, in particular while using dictionaries. if two object returns same hashcode, they are inserted in the dictionary with chaining. While accessing the item equals method is used.
The default implementation of GetHashCode() for reference types returns a hash code that is equivalent to the one returned by the GetHashCode(Object) method. You can override GetHashCode() for immutable reference types.
A hash code is a numeric value which is used to insert and identify an object in a hash-based collection. The GetHashCode method provides this hash code for algorithms that need quick checks of object equality. Syntax: public virtual int GetHashCode ();
If you're implementing a reference type, you should consider overriding the Equals method if your type looks like a base type, such as Point, String, BigNumber, and so on. Override the GetHashCode method to allow a type to work correctly in a hash table.
Your "problem" here is that you are trying the get collision free hash codes. While this is perfect for the lookup performance of collection implementations that use the hash code for lookup (e.g. HashSet
and Dictionary
) in the most cases this will not work.
The reason for that is that the hash code is just a 32-bit integer value and it represents data that is usually a lot bigger (multiple integer values, strings, etc.).
So the hash code is only there to define that two objects could be equal. The collection classes use the hash code to refine the area where the object is stored and use the equals function to find if two objects are really the same. For that reason you should always implement the Equals
function for classes you implemented the hash code for. While those classes will fall back to the equals function of object, is it also a good idea to implement the IEquatable<T>
interface to avoid typing problems of any kind (still overwrite the default equals method of Object!)
My question: How should I handle null values while using algorithm from example above?
I don't think the problem is with null
per-se. The problem lays in the fact you're using GetHashCode
for equality, which it isn't meant for. GetHashCode
should provide such hashes that aspire to normal distribution.
The docs say:
Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes.
And then goes on to specify the purpose of GetHashCode
:
A hash code is intended for efficient insertion and lookup in collections that are based on a hash table.
You should be implementing IEquatable<Entity>
, where you actually define the equivalence relation of two entities. And override !=
and ==
while you're at it.
An approximation:
public class Entity : IEquatable<Entity>
{
public int EntityId { get; set; }
public string EntityName { get; set; }
public bool Equals(Entity other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return EntityId == other.EntityId &&
string.Equals(EntityName, other.EntityName);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((Entity) obj);
}
public static bool operator ==(Entity left, Entity right)
{
return Equals(left, right);
}
public static bool operator !=(Entity left, Entity right)
{
return !Equals(left, right);
}
public override int GetHashCode()
{
unchecked
{
return (EntityId*397) ^ (EntityName != null ? EntityName.GetHashCode() : 0);
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With