Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a list into sublist by checking a condition on elements

Suppose I have an array of integeres and I want to split it into several parts, and I want to use zero as the condition on when to break. Something like this :

[1,2,3,0,4,5,0,6,7] => [[1,2,3,0], [4,5,0], [6,7]]

Well, it can be easily done using two for loops, but I wanted to know if it's possible to do this with LINQ.

There's a couple of question like this[1],[2], but as opposed to this one, they rely on a condition that's been provided from outside of the list.

Note: I know it's not polite to ask more than one question in a thread, but if anyone out there is familiar with functional programming ( since in it's essence, it's really an FP question ), I'd also like to see their perspective and possible solutions for this problem.

like image 998
Rsh Avatar asked Apr 03 '14 20:04

Rsh


3 Answers

You have a dependency between separate elements of your collection, specifically, for each element you want to know "was the previous element a zero?". As soon as your query depends on the previous element (or, more generally, as soon as your query depends on other elements of the same sequence), you should reach for Aggregate (or in more general functional programming terms, fold). This is because Aggregate, unlike other LINQ operators, allows you to carry state with you from one iteration to the next.

So, to answer your question, I'd write this query as follows in LINQ.

// assume our list of integers it called values
var splitByZero = values.Aggregate(new List<List<int>>{new List<int>()},
                                   (list, value) => {
                                       list.Last().Add(value);
                                       if (value == 0) list.Add(new List<int>());
                                       return list;
                                   });

I'll break this down into parts so I can better explain my thinking.

values.Aggregate(new List<List<int>>{new List<int>()},

As I said before, reach for Aggregate because we need to carry state. Putting a new empty list into our List of Lists removes an edge case of the List<List<int>> having no lists in it.

(list, value) => {...}

Again, looking at the signature of our lambda expression (which is Func<List<List<int>>, int, List<List<int>>), we can see the state passing explicitly: we accept a List<List<int>> and return the same.

list.Last().Add(value);

Since we always want to work on the most recent List<int>, we get the Last() element of our list of lists (which will never be null due to the section above).

if (value == 0) list.Add(new List<int>());

This is where we do the splitting - on the next iteration, the call to Last() will return this new list.

return list;

And we finally pass the state on to the next iteration.


This could be generalized with little difficulty, in a SplitOn method, as follows:

public static IEnumerable<IEnumerable<T>> SplitOn<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    return source.Aggregate(new List<List<T>> {new List<T>()},
                            (list, value) =>
                                {
                                    list.Last().Add(value);
                                    if (predicate(value)) list.Add(new List<T>());
                                    return list;
                                });
}

The version using IEnumerable's instead of List's is somewhat less clear because of the way Enumerables work, but again, isn't particularly hard to create from the above code, and looks like (simplified just a touch via a ternary operator):

public static IEnumerable<IEnumerable<T>> SplitOn<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    return source.Aggregate(Enumerable.Repeat(Enumerable.Empty<T>(), 1),
                            (list, value) =>
                                {
                                    list.Last().Concat(Enumerable.Repeat(value, 1));
                                    return predicate(value) ? list.Concat(Enumerable.Repeat(Enumerable.Empty<T>(), 1)) : list;
                                });
}

You also might find Haskell's implementation of splitOn interesting, as it does exactly what you want. I would call it nontrivial (to put it lightly).

like image 155
Zack Butcher Avatar answered Nov 09 '22 07:11

Zack Butcher


Here's an extension that helps:

public static IEnumerable<Tuple<TIn, int>> MarkWithLabels<TIn>(this IEnumerable<TIn> src, Predicate<TIn> splittingCondition)
{
    int label = 0;
    foreach (TIn item in src)
    {
        yield return new Tuple<TIn, int>(item, label);
        if (splittingCondition(item))
            label++;
    }
}

With it, the following does the trick

int breakingValue = 0;
var subseq = seq.MarkWithLabels(i => i == breakingValue)
    .GroupBy(tup => tup.Item2)
    .Select(group => group.Select(tup => tup.Item1).ToArray())
    .ToArray();

FP solution could be basically the same, except the foreach.

like image 40
voidengine Avatar answered Nov 09 '22 07:11

voidengine


I've compiled two extension methods based entirely on Zack's answer.

public static IEnumerable<List<T>> SplitBefore<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    return source.Aggregate(
        Enumerable.Repeat(new List<T>(), 1),
        (list, value) =>
        {
            if (predicate(value))
                list = list.Concat(Enumerable.Repeat(new List<T>(), 1));
            list.Last().Add(value);
            return list;
        }
    )
    .Where(list => list.Any());
}


public static IEnumerable<List<T>> SplitAfter<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    return source.Aggregate(
        Enumerable.Repeat(new List<T>(), 1),
        (list, value) =>
        {
            list.Last().Add(value);
            return predicate(value)
                ? list.Concat(Enumerable.Repeat(new List<T>(), 1))
                : list;
        }
    )
    .Where(list => list.Any());
}
like image 1
Robert Synoradzki Avatar answered Nov 09 '22 07:11

Robert Synoradzki