Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ: GroupBy with maximum count in each group

Tags:

linq

I have a list of duplicate numbers:

Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
// {1,1,1,2,2,2,3,3,3}

I group them and get quantity of occurance:

Enumerable.Range(1,3).Select(o => Enumerable.Repeat(o, 3)).SelectMany(o => o)
    .GroupBy(o => o).Select(o => new { Qty = o.Count(), Num = o.Key })

Qty   Num
3     1
3     2
3     3

What I really need is to limit the quantity per group to some number. If the limit is 2 the result for the above grouping would be:

Qty   Num
2     1
1     1
2     2
1     2
2     3
1     3

So, if Qty = 10 and limit is 4, the result is 3 rows (4, 4, 2). The Qty of each number is not equal like in example. The specified Qty limit is the same for whole list (doesn't differ based on number).

Thanks

like image 655
JKJKJK Avatar asked Jun 03 '10 16:06

JKJKJK


3 Answers

Some of the other answers are making the LINQ query far more complex than it needs to be. Using a foreach loop is certainly faster and more efficient, but the LINQ alternative is still fairly straightforward.

var input = Enumerable.Range(1, 3).SelectMany(x => Enumerable.Repeat(x, 10));
int limit = 4;

var query =
    input.GroupBy(x => x)
         .SelectMany(g => g.Select((x, i) => new { Val = x, Grp = i / limit }))
         .GroupBy(x => x, x => x.Val)
         .Select(g => new { Qty = g.Count(), Num = g.Key.Val });
like image 89
LukeH Avatar answered Nov 18 '22 01:11

LukeH


There was a similar question that came up recently asking how to do this in SQL - there's no really elegant solution and unless this is Linq to SQL or Entity Framework (i.e. being translated into a SQL query), I'd really suggest that you not try to solve this problem with Linq and instead write an iterative solution; it's going to be a great deal more efficient and easier to maintain.

That said, if you absolutely must use a set-based ("Linq") method, this is one way you could do it:

var grouped =
    from n in nums
    group n by n into g
    select new { Num = g.Key, Qty = g.Count() };

int maxPerGroup = 2;
var portioned =
    from x in grouped
    from i in Enumerable.Range(1, grouped.Max(g => g.Qty))
    where (x.Qty % maxPerGroup) == (i % maxPerGroup)
    let tempQty = (x.Qty / maxPerGroup) == (i / maxPerGroup) ? 
        (x.Qty % maxPerGroup) : maxPerGroup
    select new
    {
        Num = x.Num,
        Qty = (tempQty > 0) ? tempQty : maxPerGroup
    };

Compare with the simpler and faster iterative version:

foreach (var g in grouped)
{
    int remaining = g.Qty;
    while (remaining > 0)
    {
        int allotted = Math.Min(remaining, maxPerGroup);
        yield return new MyGroup(g.Num, allotted);
        remaining -= allotted;
    }
}
like image 22
Aaronaught Avatar answered Nov 18 '22 02:11

Aaronaught


Aaronaught's excellent answer doesn't cover the possibility of getting the best of both worlds... using an extension method to provide an iterative solution.

Untested:

public static IEnumerable<IEnumerable<U>> SplitByMax<T, U>(
  this IEnumerable<T> source,
  int max,
  Func<T, int> maxSelector,
  Func<T, int, U> resultSelector
)
{
  foreach(T x in source)
  {
    int number = maxSelector(x);
    List<U> result = new List<U>();
    do
    {
      int allotted = Math.Min(number, max); 
      result.Add(resultSelector(x, allotted));
      number -= allotted
    } while (number > 0 && max > 0);

    yield return result;
  }
}

Called by:

var query = grouped.SplitByMax(
  10,
  o => o.Qty,
  (o, i) => new {Num = o.Num, Qty = i}
)
.SelectMany(split => split);
like image 1
Amy B Avatar answered Nov 18 '22 03:11

Amy B