Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an appropriate `GetHashCode()` algorithm for a 2D point struct (avoiding clashes)

Tags:

c#

hashcode

point

Consider the following code:

struct Vec2 : IEquatable<Vec2>
{
    double X,Y;

    public bool Equals(Vec2 other)
    {
        return X.Equals(other.X) && Y.Equals(other.Y);
    }

    public override bool Equals(object obj)
    {
        if (obj is Vec2)
        {
            return Equals((Vec2)obj);
        }
        return false;
    }

    // this will return the same value when X, Y are swapped
    public override int GetHashCode()
    {
        return X.GetHashCode() ^ Y.GetHashCode();
    }

}

Beyond the conversation of comparing doubles for equality (this is just demo code), what I am concerned with is that there is a hash clash when X, Y values are swapped. For example:

Vec2 A = new Vec2() { X=1, Y=5 };
Vec2 B = new Vec2() { X=5, Y=1 };

bool test1 = A.Equals(B);  // returns false;
bool test2 = A.GetHashCode() == B.GetHashCode() // returns true !!!!!

which should wreck havoc in a dictionary collection. So the question is how to property form the GetHashCode() function for 2,3 or even 4 floating point values such that the results are not symmetric and the hashes don't clash.

Edit 1:

Point implements the inappropriate x ^ y solution, and PointF wraps ValueType.GetHashCode().

Rectangle has a very peculiar (((X ^ ((Y << 13) | (Y >> 19))) ^ ((Width << 26) | (Width >> 6))) ^ ((Height << 7) | (Height >> 25))) expression for the hash code, which seems to perform as expected.

Edit 2:

'System.Double' has a nice implementation as it does not consider each bit equally important

public override unsafe int GetHashCode() //from System.Double
{
    double num = this;
    if (num == 0.0)
    {
        return 0;
    }
    long num2 = *((long*) &num);
    return (((int) num2) ^ ((int) (num2 >> 32)));
}
like image 330
John Alexiou Avatar asked Mar 07 '11 15:03

John Alexiou


3 Answers

Jon skeet has this covered:

What is the best algorithm for an overridden System.Object.GetHashCode?

   public override int GetHashCode()
   {
       unchecked // Overflow is fine, just wrap
       {
           int hash = 17;
           // Suitable nullity checks etc, of course :)
           hash = hash * 23 + X.GetHashCode();
           hash = hash * 23 + Y.GetHashCode();
           return hash;
       }
   }

Also, change your Equals(object) implementation to:

return Equals(obj as FVector2);

Note however that this could perceive a derived type to be equal. If you don't want that, you'd have to compare the runtime type other.GetType() with typeof(FVector2) (and don't forget nullity checks) Thanks for pointing out it's a struct, LukH

Resharper has nice code generation for equality and hash code, so if you have resharper you can let it do its thing

like image 160
Ohad Schneider Avatar answered Oct 01 '22 16:10

Ohad Schneider


Hash collisions don't wreak havoc in a dictionary collection. They'll reduce the efficiency if you're unlucky enough to get them, but dictionaries have to cope with them.

Collisions should be rare if at all possible, but they're don't mean the implementation is incorrect. XORs are often bad for the reasons you've given (high collisions) - ohadsc has posted a sample I gave before for an alternative, which should be fine.

Note that it would be impossible to implement Vec2 with no collisions - there are only 232 possible return values from GetHashCode, but there are rather more possible X and Y values, even after you've removed NaN and infinite values...

Eric Lippert has a recent blog post on GetHashCode which you may find useful.

like image 21
Jon Skeet Avatar answered Oct 01 '22 15:10

Jon Skeet


What are reasonable bounds for the coordinates?

Unless it can be all possible integer values you could simply:

const SOME_LARGE_NUMBER=100000; return SOME_LARGE_NUMBER * x + y;

like image 23
KristoferA Avatar answered Oct 01 '22 17:10

KristoferA