Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid duplicates in HashSet<double> for slightly different values

I'm trying to skip duplicates for the list of values below. The result I'm trying to achieve is -112.94487230674, -49.47838592529, -89.9999574979198, I'm using the HashSet class. I've implemented the IEqualityComparer below but it's not working.

What am I doing wrong?

    class HeightEqualityComparer : IEqualityComparer<double>
    {
        public bool Equals(double a, double b)
        {
            return a - b < 1e-3;
        }

        public int GetHashCode(double value)
        {
            return value.GetHashCode();
        }
    }

Here is the list of values:

    [0] -112.94487230674    double
    [1] -112.94487230674001 double
    [2] -49.478385925290006 double
    [3] -49.47838592529     double
    [4] -49.478385925289992 double
    [5] -89.9999574979198   double
    [6] -89.99995749791978  double
    [7] -49.478385925289984 double
    [8] -89.999957497919809 double
    [9] -49.478385925290013 double
like image 405
abenci Avatar asked Jun 30 '20 07:06

abenci


People also ask

How HashSet prevent duplicates in collections?

While creating a HashSet of your own class, always ensure that the HashCode() method of the key of HashSet doesn't change. Java Object hashCode() is a native method and returns the integer hash code value of the object. If two objects are equal according to the equals() method, then their hash code must be the same.

How do you prevent duplicate elements in a set?

for the first time add SOP will return true. then for the second time of added "123", SOP will return false. With this we can understand that, if we add same value in the Set for the second time or more, then the duplicated value will be skipped.

How does set ensure that there are no duplicates Select all that apply?

If this set already contains the element, duplicate is ignored, leaves the set unchanged and returns false. This ensures that sets never contain duplicate elements.

Will a HashSet allow duplicate values?

Duplicates: HashSet doesn't allow duplicate values. HashMap stores key, value pairs and it does not allow duplicate keys.


3 Answers

Well, the current implementation of Equals

   return a - b < 1e-3;

is incorrect one. Equals must be

  1. Equals(a, a) == true;
  2. Symmetric: Equals(a, b) == Equals(b, a);
  3. Transitive Equals(a, b) && Equals(b, c) leads to Equals(a, c);

Conditions 2 and 3 are not hold in the current implementation. It's easy to repair the second condition with a help of Math.Abs; the third one is real difficulty: for arbitrary positive tolerance (which is 1e-3 in your case) we have

   a == a + tolerance == a + 2 * tolerance == ... == a + n * tolerance  

which means

   a == a + n * tolerance

for abitrary large n; thus a == b for all a and b (all numbers are equal).

As a partial (but valid) solution you can try rounding the values:

   class HeightEqualityComparer : IEqualityComparer<double>
   {
       public bool Equals(double a, double b)
       {
           return Math.Round(a, 3) == Math.Round(b, 3);
       }

       public int GetHashCode(double value)
       {
           return Math.Round(value, 3).GetHashCode();
       }
   } 

Note, that we have to change GetHashCode

like image 75
Dmitry Bychenko Avatar answered Oct 20 '22 06:10

Dmitry Bychenko


You'd need to round your values in GetHashCode to the same precision you are eliminating in the equality.

like image 2
cineam mispelt Avatar answered Oct 20 '22 07:10

cineam mispelt


Check John Skeets' answer here: How does HashSet compare elements for equality?

When you add an element to the set, it will find the hash code using IEqualityComparer.GetHashCode, and store both the hash code and the element (after checking whether the element is already in the set, of course).

In your code, all the values return different hash codes. You need to make sure that close values return the same.

like image 1
Athanasios Kataras Avatar answered Oct 20 '22 06:10

Athanasios Kataras