I have a question about Groupby
of C#.
I made a List
like shown below:
List<double> testList = new List<double>();
testList.Add(1);
testList.Add(2.1);
testList.Add(2.0);
testList.Add(3.0);
testList.Add(3.1);
testList.Add(3.2);
testList.Add(4.2);
I'd like to group these number list like this:
group 1 => 1
group 2 => 2.1 , 2.0
group 3 => 3.0 , 3.1 , 3.2
group 4 => 4.2
so, I wrote code like this:
var testListGroup = testList.GroupBy(ele => ele, new DoubleEqualityComparer(0.5));
DoubleEqualityComparer
definition is like this:
public class DoubleEqualityComparer : IEqualityComparer<double>
{
private double tol = 0;
public DoubleEqualityComparer(double Tol)
{
tol = Tol;
}
public bool Equals(double d1,double d2)
{
return EQ(d1,d2, tol);
}
public int GetHashCode(double d)
{
return d.GetHashCode();
}
public bool EQ(double dbl, double compareDbl, double tolerance)
{
return Math.Abs(dbl - compareDbl) < tolerance;
}
}
Yet the GroupBy
clause doesn't work like the this:
group 1 => 1
group 2 => 2.1
group 3 => 2.0
group 4 => 3.0
group 5 => 3.1
group 6 => 3.2
group 7 => 4.2
I don't know what the problem is. Please let me know if there is problem, and solutions.
use simple Math.Floor to get lower range of the number so that 5.8 should not be treated as 6.
List<double> testList = new List<double>();
testList.Add(1);
testList.Add(2.1);
testList.Add(2.0);
testList.Add(3.0);
testList.Add(3.1);
testList.Add(3.2);
testList.Add(4.2);
testList.Add(5.8);
testList.Add(5.5);
var testListGroup = testList.GroupBy(s => Math.Floor(s)).ToList();
Your GetHashCode
method should return the same value for numbers, that should be "equal".
EqualityComparer works in two steps:
Checked GetHashCode
if the value with this hash code was not
processed yet, then this value gets into new single group
If value with this hash code was obtained - then checkinq result of
Equals
method. If it is true - adding current element to existing group, else adding it to new group.
In your case every double
returns the different hash codes, so method Equals
does not called.
So, if you do not care about processing time, you can simple return constant value in GetHashCode
method as @FirstCall suggested. And if you care about it, I recommend to modify your method as follows:
public int GetHashCode(double d)
{
return Math.Round(d).GetHashCode();
}
Math.Round
should correctly work for tolerance = 0.5, for another tolerance values you should improve this.
I recommend you to read this blog post to get familiar with IEqualityComparer
and Linq
.
The simplest way with less amount of code is always return the constant value from the GetHashCode
- it will work for any tolerance value, but, as I wrote, it is quite inefficient solution on large amounts of data.
In these types of situations, the debugger is your friend. Put a break point on the Equals
method. You will notice that the Equals
method of your DoubleEqualityComparer
class is not getting hit.
Linq extension methods rely on GetHashCode
for equality comparisons. Since the GetHashCode
method is not returning equivalent hashes for the doubles in your list, the Equals
method is not getting called.
Each GetHashCode
method should be atomic in execution and should return the same int value for any two equal comparisons.
This is one working example, though it is not necessarily recommended depending on your usage of this comparer.
public int GetHashCode(double d)
{
return 1;
}
You can group by using below code sample,
var list = testList.GroupBy(s => Convert.ToInt32(s) ).Select(group => new { Key = group.Key, Elements = group.ToList() });
//OutPut
//group 1 => 1
//group 2 => 2.1 , 2
//group 3 => 3 , 3.1 , 3.2
//group 4 => 4.2
Explanation of the code,
When we apply GroupBy
for a list which have only a single data column,It groups by looking same content. For an example think you have string list (foo1, foo2, foo3, foo1, foo1, foo2). So then it groups into three separate group leading by foo1, foo2 and foo3.
But in this scenario you can't find any same content(1.0,2.1,2.2,2.3,3.1,3.2...)So what we should do is bring them as a same content. When we convert them to int
then it gives (1,2,2,2,3,3...). Then we can easily group it.
Everyone here is discussing what is wrong with your code, but you may actually have a worse problem than that.
If you truly want to group with a tolerance like your title says, rather than group by integer part like these answers assume (and your test data supports), this isn't supported by GroupBy
.
GroupBy
demands an equivalence relation - your equality comparer must establish that
x == x
for all x
x == y
, y == x
for all x
and y
x == y
and y == z
, x == z
for all x
, y
and z
"Within 0.5 of each other" matches the first two points, but not the third. 0 is close so 0.4, and 0.4 is close to 0.8, but 0 is not close to 0.8. Given an input of 0, 0.4 and 0.8, what groups would you expect?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With