Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LINQ query — Data aggregation (Group Adjacent)

Tags:

c#

linq

grouping

Let's take a class called Cls:

public class Cls
{
    public int SequenceNumber { get; set; }
    public int Value { get; set; }
}

Now, let's populate some collection with following elements:

Sequence
Number      Value
========    =====
1           9
2           9
3           15
4           15
5           15
6           30
7           9

What I need to do, is to enumerate over Sequence Numbers and check if the next element has the same value. If yes, values are aggregated and so, desired output is as following:

Sequence    Sequence
Number      Number
From        To          Value
========    ========    =====
1           2           9
3           5           15
6           6           30
7           7           9

How can I perform this operation using LINQ query?

like image 888
Dariusz Woźniak Avatar asked Feb 14 '13 16:02

Dariusz Woźniak


4 Answers

You can use Linq's GroupBy in a modified version which groups only if the two items are adjacent, then it's easy as:

var result = classes
    .GroupAdjacent(c => c.Value)
    .Select(g => new { 
        SequenceNumFrom = g.Min(c => c.SequenceNumber),
        SequenceNumTo = g.Max(c => c.SequenceNumber),  
        Value = g.Key
    });

foreach (var x in result)
    Console.WriteLine("SequenceNumFrom:{0} SequenceNumTo:{1} Value:{2}", x.SequenceNumFrom, x.SequenceNumTo, x.Value);

DEMO

Result:

SequenceNumFrom:1  SequenceNumTo:2  Value:9
SequenceNumFrom:3  SequenceNumTo:5  Value:15
SequenceNumFrom:6  SequenceNumTo:6  Value:30
SequenceNumFrom:7  SequenceNumTo:7  Value:9

This is the extension method to to group adjacent items:

public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
        this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector)
    {
        TKey last = default(TKey);
        bool haveLast = false;
        List<TSource> list = new List<TSource>();
        foreach (TSource s in source)
        {
            TKey k = keySelector(s);
            if (haveLast)
            {
                if (!k.Equals(last))
                {
                    yield return new GroupOfAdjacent<TSource, TKey>(list, last);
                    list = new List<TSource>();
                    list.Add(s);
                    last = k;
                }
                else
                {
                    list.Add(s);
                    last = k;
                }
            }
            else
            {
                list.Add(s);
                last = k;
                haveLast = true;
            }
        }
        if (haveLast)
            yield return new GroupOfAdjacent<TSource, TKey>(list, last);
    }
}

and the class used:

public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
    public TKey Key { get; set; }
    private List<TSource> GroupList { get; set; }
    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
    }
    System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
    {
        foreach (var s in GroupList)
            yield return s;
    }
    public GroupOfAdjacent(List<TSource> source, TKey key)
    {
        GroupList = source;
        Key = key;
    }
}
like image 151
Tim Schmelter Avatar answered Sep 21 '22 20:09

Tim Schmelter


You can use this linq query

Demo

var values = (new[] { 9, 9, 15, 15, 15, 30, 9 }).Select((x, i) => new { x, i });

var query = from v in values
            let firstNonValue = values.Where(v2 => v2.i >= v.i && v2.x != v.x).FirstOrDefault()
            let grouping = firstNonValue == null ? int.MaxValue : firstNonValue.i
            group v by grouping into v
            select new
            {
              From = v.Min(y => y.i) + 1,
              To = v.Max(y => y.i) + 1,
              Value = v.Min(y => y.x)
            };
like image 24
Aducci Avatar answered Sep 20 '22 20:09

Aducci


MoreLinq provides this functionality out of the box

It's called GroupAdjacent and is implemented as extension method on IEnumerable:

Groups the adjacent elements of a sequence according to a specified key selector function.

enumerable.GroupAdjacent(e => e.Key)

There is even a Nuget "source" package that contains only that method, if you don't want to pull in an additional binary Nuget package.

The method returns an IEnumerable<IGrouping<TKey, TValue>>, so its output can be processed in the same way output from GroupBy would be.

like image 39
theDmi Avatar answered Sep 21 '22 20:09

theDmi


You can do it like this:

var all = new [] {
    new Cls(1, 9)
,   new Cls(2, 9)
,   new Cls(3, 15)
,   new Cls(4, 15)
,   new Cls(5, 15)
,   new Cls(6, 30)
,   new Cls(7, 9)
};
var f = all.First();
var res = all.Skip(1).Aggregate(
    new List<Run> {new Run {From = f.SequenceNumber, To = f.SequenceNumber, Value = f.Value} }
,   (p, v) => {
    if (v.Value == p.Last().Value) {
        p.Last().To = v.SequenceNumber;
    } else {
        p.Add(new Run {From = v.SequenceNumber, To = v.SequenceNumber, Value = v.Value});
    }
    return p;
});
foreach (var r in res) {
    Console.WriteLine("{0} - {1} : {2}", r.From, r.To, r.Value);
}

The idea is to use Aggregate creatively: starting with a list consisting of a single Run, examine the content of the list we've got so far at each stage of aggregation (the if statement in the lambda). Depending on the last value, either continue the old run, or start a new one.

Here is a demo on ideone.

like image 22
Sergey Kalinichenko Avatar answered Sep 20 '22 20:09

Sergey Kalinichenko