Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GroupBy of List<double> with tolerance doesn't work [duplicate]

Tags:

c#

linq

I have a question about Groupby of C#.

I made a List like shown below:

List<double> testList = new List<double>();

testList.Add(1);    
testList.Add(2.1);  
testList.Add(2.0);  
testList.Add(3.0);  
testList.Add(3.1);  
testList.Add(3.2);  
testList.Add(4.2);  

I'd like to group these number list like this:

group 1 => 1  
group 2 => 2.1 , 2.0  
group 3 => 3.0 , 3.1 , 3.2  
group 4 => 4.2

so, I wrote code like this:

var testListGroup = testList.GroupBy(ele => ele, new DoubleEqualityComparer(0.5));

DoubleEqualityComparer definition is like this:

public class DoubleEqualityComparer : IEqualityComparer<double>
{
    private double tol = 0;

    public DoubleEqualityComparer(double Tol)
    {
        tol = Tol;
    }

    public bool Equals(double d1,double d2)
    {
        return EQ(d1,d2, tol);
    }

    public int GetHashCode(double d)
    {
        return d.GetHashCode();
    }
    public bool EQ(double dbl, double compareDbl, double tolerance)
    {
        return Math.Abs(dbl - compareDbl) < tolerance;
    }
}

Yet the GroupBy clause doesn't work like the this:

group 1 => 1  
group 2 => 2.1
group 3 => 2.0  
group 4 => 3.0
group 5 => 3.1
group 6 => 3.2
group 7 => 4.2

I don't know what the problem is. Please let me know if there is problem, and solutions.

like image 225
kdm Avatar asked Sep 07 '16 04:09

kdm


5 Answers

use simple Math.Floor to get lower range of the number so that 5.8 should not be treated as 6.

List<double> testList = new List<double>();

testList.Add(1);
testList.Add(2.1);
testList.Add(2.0);
testList.Add(3.0);
testList.Add(3.1);
testList.Add(3.2);
testList.Add(4.2);
testList.Add(5.8);
testList.Add(5.5);

var testListGroup = testList.GroupBy(s => Math.Floor(s)).ToList();
like image 184
A.T. Avatar answered Nov 18 '22 13:11

A.T.


Your GetHashCode method should return the same value for numbers, that should be "equal".

EqualityComparer works in two steps:

  1. Checked GetHashCode if the value with this hash code was not processed yet, then this value gets into new single group

  2. If value with this hash code was obtained - then checkinq result of Equals method. If it is true - adding current element to existing group, else adding it to new group.

In your case every double returns the different hash codes, so method Equals does not called.

So, if you do not care about processing time, you can simple return constant value in GetHashCode method as @FirstCall suggested. And if you care about it, I recommend to modify your method as follows:

public int GetHashCode(double d)
{
    return Math.Round(d).GetHashCode();
}

Math.Round should correctly work for tolerance = 0.5, for another tolerance values you should improve this.

I recommend you to read this blog post to get familiar with IEqualityComparer and Linq.

The simplest way with less amount of code is always return the constant value from the GetHashCode - it will work for any tolerance value, but, as I wrote, it is quite inefficient solution on large amounts of data.

like image 25
Mikhail Tulubaev Avatar answered Nov 18 '22 13:11

Mikhail Tulubaev


In these types of situations, the debugger is your friend. Put a break point on the Equals method. You will notice that the Equals method of your DoubleEqualityComparer class is not getting hit.

Linq extension methods rely on GetHashCode for equality comparisons. Since the GetHashCode method is not returning equivalent hashes for the doubles in your list, the Equals method is not getting called.

Each GetHashCode method should be atomic in execution and should return the same int value for any two equal comparisons.

This is one working example, though it is not necessarily recommended depending on your usage of this comparer.

public int GetHashCode(double d)
{
     return 1;
}
like image 1
FirstCall Avatar answered Nov 18 '22 12:11

FirstCall


You can group by using below code sample,

var list = testList.GroupBy(s => Convert.ToInt32(s) ).Select(group => new { Key = group.Key, Elements = group.ToList() });

//OutPut
//group 1 => 1  
//group 2 => 2.1 , 2  
//group 3 => 3 , 3.1 , 3.2  
//group 4 => 4.2

Explanation of the code, When we apply GroupBy for a list which have only a single data column,It groups by looking same content. For an example think you have string list (foo1, foo2, foo3, foo1, foo1, foo2). So then it groups into three separate group leading by foo1, foo2 and foo3.

But in this scenario you can't find any same content(1.0,2.1,2.2,2.3,3.1,3.2...)So what we should do is bring them as a same content. When we convert them to int then it gives (1,2,2,2,3,3...). Then we can easily group it.

like image 1
SilentCoder Avatar answered Nov 18 '22 13:11

SilentCoder


Everyone here is discussing what is wrong with your code, but you may actually have a worse problem than that.

If you truly want to group with a tolerance like your title says, rather than group by integer part like these answers assume (and your test data supports), this isn't supported by GroupBy.

GroupBy demands an equivalence relation - your equality comparer must establish that

  • x == x for all x
  • if x == y, y == x for all x and y
  • if x == y and y == z, x == z for all x, y and z

"Within 0.5 of each other" matches the first two points, but not the third. 0 is close so 0.4, and 0.4 is close to 0.8, but 0 is not close to 0.8. Given an input of 0, 0.4 and 0.8, what groups would you expect?

like image 1
Rawling Avatar answered Nov 18 '22 13:11

Rawling