Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to get a hashcode of a float with epsilon?

It is well known that comparing floats by == is usually a mistake. In a 3D-vector class (with float components X, Y, Z) i wrote, two vectors are considered equal if their distance is considered zero.

public override bool Equals(object obj)
{
    if (obj == null) {
        return false;
    }

    if (GetType () != obj.GetType ()) {
        return false;
    }

    float d = DistSq ((Vec) obj);

    return IsConsideredZero (d);
}

public float DistSq(Vec p)
{
    Vec d = this - p;
    return d.LengthSq ();
}

public float LengthSq()
{
    return X * X + Y * Y + Z * Z;
}

private const float VEC_COMPARE_EPSILON_ABS = 1E-05f;
public static bool IsConsideredZero(float f)
{
    return Math.Abs (f) < VEC_COMPARE_EPSILON_ABS;
}

So far, everything worked fine. However, now i'd like to get a hashcode of the vector. I can see that something like hash = (int)X^(int)Y^(int)Z is bound to fail.

The best i could come up with was:

public override int GetHashCode()
{
    return 0;
}

This, of course, kind of sucks. Is there any way to get a reasonable hashcode? NaNs and other special values are possible, but unlikely, in case that is important.

like image 909
mafu Avatar asked Feb 24 '09 08:02

mafu


4 Answers

It's impossible assuming you want to have the normal hashcode/equality properties:

  • If X = Y and Y = Z then X = Z (transitivity)
  • If X = Y then Y = X (commutivity)
  • X = X for all X (reflexivity)

The first rule is the problem - because if each value is deemed "equal" to the next greater representable number, you end up with all numbers being equal. For instance, suppose a number is deemed equal to another they're within 0.1:

0 equals 0.08 0.08 equals 0.16 0.16 equals 0.24

=> 0 equals 0.16 by the transitivity rule => 0 equals 0.24 by the transitivity rule

(etc)

If you ignore the transitivity rule, then you still (presumably) want "equal" values to have equal hashcodes. This effectively enforces the transitivity rule - in the above example, 0 and 0.08 have to have equal hashcodes, as do 0 and 0.16. Therefore 0 and 0.16 have to have equal hashcodes, and so on. Therefore you can have no useful hashcode - it has to be a constant.

like image 136
Jon Skeet Avatar answered Nov 07 '22 05:11

Jon Skeet


I don't think you can have a hashcode that is consistent with your comparison method because the latter is not transitive: for any three vectors A, B, C, if A.Equals(B) and B.Equals(C) are true, it could still be the case that A.Equals(C) is false. (Imagine if the distance between A and B is 6e-6, between B and C is 6e-6, and between A and C is 1.2e-5) But equality of hashcodes is always transitive, since they're just numbers.

In this case, I'd just create a hashcode method that computes the hash based on the exact values of the floating-point coordinates, and mention in the documentation that it's inconsistent with equals. I know it's not really a solution but given that I don't think a real solution exists, it's better to have a nontrivial hashcode than just 0.

like image 41
David Z Avatar answered Nov 07 '22 05:11

David Z


I'm afraid it is not in the general case. A sketch of a proof goes like this:

Take any two numbers a and b. Let the difference between them be d. Then if you create the d/epsilon numbers with an epsilon step in between, each step must be "equal" to the step before, which by hashcode semantics have the same hashcode. So all numbers must have the same hashcode.

You can only solve this problem if you add some other constraint.

As an aside, you definition of Equals is wrong as well, as it can be true that a.Equals(b) and b.Equals(c) but not a.Equals(c), which is wrong for equals. This is known as breaking the Transitivity property.

What can I do then?

The solution depends on what you are using the hash for. One solution would be to introduce a conceptual grid. Change the equals and hashcode so two numbers are equal if in the same grid cube, by rounding to a constant number of decimal places, then taking equals and hashcode on the rounded number. If being close to zero is an important case, add a offset of epsilon/2 before rounding, so zero is the centre of the cube. This is correct, but you can have two numbers arbitrarily close together (under the limits of float) without being equal. So for some applications it will be ok, others it won't be. This is similar to an idea from mghie.

like image 7
Nick Fortescue Avatar answered Nov 07 '22 05:11

Nick Fortescue


Everybody is correct ...

HOWEVER, one thing that is often done is to extend the concept of hash a bit. Consider a partition of your 3d space with boxes with a side >> epsilon.

The hash of a point is the box it belongs to. When you want to lookup for a point, you don't check for the point with the corresponding box (as you would do for a regular hash) but for the neighboring boxes as well. In 3d you should get away with max 8 boxes.

like image 4
fulmicoton Avatar answered Nov 07 '22 06:11

fulmicoton