Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Enumerable.Single() iterate all elements, even when more than one item has already been found?

Tags:

c#

linq

.net-4.0

When profiling one of our applications, we discovered a mysterious slowdown in some code where we were calling Enumerable.Single(source, predicate) for a large collection that had more than one item that matched the predicate near the start of the collection.

Investigation revealed that the implementation of Enumerable.Single() is as follows:

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) 
{
        TSource result = default(TSource);
        long count = 0;
        // Note how this always iterates through ALL the elements:
        foreach (TSource element in source) { 
            if (predicate(element)) {
                result = element;
                checked { count++; }
            }
        }
        switch (count) {
            case 0: throw Error.NoMatch();
            case 1: return result;
        }
        throw Error.MoreThanOneMatch();
    }

That implementation will iterate through every element of the sequence, even if more than one element has already matched the predicate.

The following implementation would appear to yield the same results:

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    TSource result = default(TSource);
    long count = 0;
    foreach (TSource element in source) {
        if (predicate(element)) {
            if (count == 1) // Exit loop immediately if more than one match found.
                throw Error.MoreThanOneMatch();

            result = element;
            count++; // "checked" is no longer needed.
        }
    }

    if (count == 0)
        throw Error.NoMatch();

    return result;
}

Does anyone know why the actual implementation doesn't use this obvious optimization? Is there something I'm missing? (I can't imagine that such an obvious optimization would be overlooked, and therefore there must be some concrete reason for it.)

(Note: I realize that this question may attract answers that are opinions; I'm hoping for answers that provide concrete reasons for iterating all elements. If the answer is actually "because the designers didn't think such an optimization was necessary", then this question is unanswerable and I guess I should just delete it...)


For comparison, look at the implementation of Single() which does not take a predicate:

public static TSource Single<TSource>(this IEnumerable<TSource> source) 
{
    IList<TSource> list = source as IList<TSource>;
    if (list != null) {
        switch (list.Count) {
            case 0: throw Error.NoElements();
            case 1: return list[0];
        }
    }
    else {
        using (IEnumerator<TSource> e = source.GetEnumerator()) {
            if (!e.MoveNext()) throw Error.NoElements();
            TSource result = e.Current;
            if (!e.MoveNext()) return result;
        }
    }
    throw Error.MoreThanOneElement();
}

In this case, they've gone to the effort of adding an optimisation for IList.

like image 842
Matthew Watson Avatar asked Oct 29 '18 09:10

Matthew Watson


3 Answers

You didn't seem to be the only one thinking that. The .NET Core implementation has an optimized version:

using (IEnumerator<TSource> e = source.GetEnumerator())
{
    while (e.MoveNext())
    {
        TSource result = e.Current;
        if (predicate(result))
        {
            while (e.MoveNext())
            {
                if (predicate(e.Current))
                {
                    throw Error.MoreThanOneMatch();
                }
            }

            return result;
        }
    }
}

So to answer your question: there doesn't seem to be a 'good' reason, other than just a developer not thinking about optimizing this use case.

like image 66
Patrick Hofman Avatar answered Nov 13 '22 16:11

Patrick Hofman


The optimization was applied in .NET Core

The code now is :

public static TSource Single<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null)
    {
        throw Error.ArgumentNull(nameof(source));
    }

    if (predicate == null)
    {
        throw Error.ArgumentNull(nameof(predicate));
    }

    using (IEnumerator<TSource> e = source.GetEnumerator())
    {
        while (e.MoveNext())
        {
            TSource result = e.Current;
            if (predicate(result))
            {
                while (e.MoveNext())
                {
                    if (predicate(e.Current))
                    {
                        throw Error.MoreThanOneMatch();
                    }
                }

                return result;
            }
        }
    }

    throw Error.NoMatch();
}

Wherever possible, the code even checks whether the target is an IList<T> so it can avoid iterating:

public static TSource Single<TSource>(this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull(nameof(source));
    }

    if (source is IList<TSource> list)
    {
        switch (list.Count)
        {
            case 0:
                throw Error.NoElements();
            case 1:
                return list[0];
        }
    }
    else
    {
        using (IEnumerator<TSource> e = source.GetEnumerator())
        {
            if (!e.MoveNext())
            {
                throw Error.NoElements();
            }

            TSource result = e.Current;
            if (!e.MoveNext())
            {
                return result;
            }
        }
    }

    throw Error.MoreThanOneElement();
}

UPDATE

Checking the git blame output shows that the iteration optimization was applied back in 2016!

The IList<> optimization was added 1 year ago, probably as part of the Core 2.1 optimizations

like image 44
Panagiotis Kanavos Avatar answered Nov 13 '22 16:11

Panagiotis Kanavos


As the other answers have pointed out, the optimization has been applied, but I would just like to raise the hypothesis that they had done it that way originally thinking of the fact that they have no way to guarantee that the predicate function does not have side effects.

I'm not sure that there would truly be a case where such behavior would be used/useful, but it is a consideration to keep in mind.

like image 5
TurtleKwitty Avatar answered Nov 13 '22 16:11

TurtleKwitty