Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linq: GroupBy vs Distinct

Tags:

c#

linq

I've been trying to get a Linq query to return distinct values from a collection. I've found two ways to go about it; either use GroupBy or Distinct. I know that Distinct was made for the job but I have to implement IEquatable on the object.

I tried GroupBy and that worked just fine. I want to know if using Distinct vs GroupBy has a distinct performance advantage.

like image 204
Varun Rathore Avatar asked Feb 27 '14 10:02

Varun Rathore


People also ask

Should we use distinct or GROUP BY?

GROUP BY is required if you're aggregating data, but in many cases, DISTINCT is simpler to write and read if you aren't aggregating data. The major difference between the DISTINCT and GROUP BY is, GROUP BY operator is meant for the aggregating or grouping rows whereas DISTINCT is just used to get distinct values.

Why is GROUP BY faster than distinct?

DISTINCT would usually be faster than GROUP BY if a) there's no index on that column and b) you are not ordering as well since GROUP BY does both filtering and ordering.

What is the difference between distinct and GROUP BY clause?

GROUP BY lets you use aggregate functions, like AVG , MAX , MIN , SUM , and COUNT . On the other hand DISTINCT just removes duplicates. This will give you one row per department, containing the department name and the sum of all of the amount values in all rows for that department.

Which is better distinct or GROUP BY in Oracle?

DISTINCT implies you want a distinct set of columns. However, GROUP BY implies you want to compute some sort of aggregate value which you are not. It will take more time in your case.


1 Answers

Distinct() will compare entire objects in collection (for reference types you need GetHashCode and Equals to be overridden). It will enumerate items and just add them to set. Simple and fast. Something like:

Set<TSource> set = new Set<TSource>(comparer);

foreach (TSource tSource in source)
{
     if (!set.Add(tSource))
          continue;

     yield return tSource;
}

GroupBy() allows you to group object by some key. In this case keys will be compared. It will need to execute key selector lambda for each item in collection. Also it will need to create grouping for each distinct key and add each item in collection to its group:

Func<TSource, TElement> elementSelector = x => x;

<TKey, TElement> lookup = new Lookup<TKey, TElement>(comparer);
foreach (TSource tSource in source)
{
     TKey key = keySelector(tSource);

     // simplified pseudo-code
     if (!lookup.Contains(key))
          lookup.Add(new Grouping(key)); 

     lookup[key].Add(elementSelector(tSource));
}

foreach(IGrouping<TKey, TElement> grouping in lookup)
    yield return grouping;

So, I think GroupBy() is not that fast as simple Distict().

like image 54
Sergey Berezovskiy Avatar answered Oct 13 '22 00:10

Sergey Berezovskiy