Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently collapse IEnumerable values together

Tags:

arrays

c#

I'd like to 'collapse' the values in some IEnumerable together such that adjacent elements that are the same get collapsed to a single element.

I can't think of a better way to describe the problem other than an example:

The array [0,0,2,0,1,1,2,2,2,1,0,0,2,1,1,0,1,1,1] Should become [0,2,0,1,2,1,0,2,1,0,1]

In my use case, this needs to happen in a critical loop and therefore must be as fast as possible. I could loop through the array and check each element against the previous, removing if it's a duplicate, but I'm hoping there's a faster way.

My use will only be for relatively short arrays (<100 elements) and only uses int, however a general purpose solutions would be appreciated.

EDIT: As pointed out below, the problem is fundamentally O(n) complexity, but I was hoping something linqy might beat my (probably clumsy) implementation.


2 Answers

If you want a general-purpose solution, write an extension method:

This should do the job just fine:

public static IEnumerable<T> DistinctConsecutive<T>(this IEnumerable<T> sequence)
    => sequence.DistinctConsecutive(EqualityComparer<T>.Default);

public static IEnumerable<T> DistinctConsecutive<T>(this IEnumerable<T> sequence, IEqualityComparer<T> comparer)
{
    if (sequence == null)
        throw new ArgumentNullException(nameof(sequence));
    if (comparer == null)
        throw new ArgumentNullException(nameof(comparer));

    return DistinctConsecutiveImpl(sequence, comparer);
}

private static IEnumerable<T> DistinctConsecutiveImpl<T>(IEnumerable<T> sequence, IEqualityComparer<T> comparer)
{
    using (var enumerator = sequence.GetEnumerator())
    {
        if (!enumerator.MoveNext())
            yield break;

        var lastValue = enumerator.Current;
        yield return lastValue;

        while (enumerator.MoveNext())
        {
            var value = enumerator.Current;
            if (comparer.Equals(lastValue, value))
                continue;

            yield return value;
            lastValue = value;
        }
    }
}

Or, the lazier approach:

public static IEnumerable<T> DistinctConsecutive<T>(this IEnumerable<T> sequence, IEqualityComparer<T> comparer = null)
{
    if (comparer == null)
        comparer = EqualityComparer<T>.Default;

    using (var enumerator = sequence.GetEnumerator())
    {
        if (!enumerator.MoveNext())
            yield break;

        var lastValue = enumerator.Current;
        yield return lastValue;

        while (enumerator.MoveNext())
        {
            var value = enumerator.Current;
            if (comparer.Equals(lastValue, value))
                continue;

            yield return value;
            lastValue = value;
        }
    }
}

If you need an optimized solution, throw out generics and use == instead of the IEqualityComparer<T>. And if this is still a bottleneck, do it with a plain old for loop.

like image 63
Lucas Trzesniewski Avatar answered May 04 '26 04:05

Lucas Trzesniewski


I could loop through the array and check each element against the previous, removing if it's a duplicate, but I'm hoping there's a faster way.

Fundamentally there's never going to be a way that's faster algorithmically. There can be subtle differences in implementations using that same algorithm, but that's the best that you can get. There's no way to avoid checking each item, so the operation will be O(n) no matter what you do.

like image 29
Servy Avatar answered May 04 '26 03:05

Servy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!