Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count an IOrderedEnumerable without consuming it

What I want to do, short version:

var source = new[]{2,4,6,1,9}.OrderBy(x=>x);
int count = source.Count; // <-- get the number of elements without performing the sort

Long version:

To determine the number of elements in an IEnumerable, it is neccessary to iterate over all elements. This could potentially be a very expensive operation.

If the IEnumerable can be cast to ICollection, then the count can be determined quickly without iterating. The LINQ Count() method does this automatically.

The function myEnumerable.OrderBy() returns an IOrderedEnumerable. An IOrderedEnumerable can obviously not be cast to ICollection, so calling Count() will consume the whole thing.

But sorting does not change the number of elements, and an IOrderedEnumerable has to keep a reference to its source. So if that source is an ICollection, it should be possible to determine the count from the IOrderedEnumerable without consuming it.

My goal is to have a library method, that takes an IEnumerable with n elements, and then for example retrieves the element at position n/2;

I want to avoid iterating over the IEnumerable twice just to get its count, but I also want to avoid creating an unnecessary copy if at all possible.


Here is a skeleton of the function I want to create

public void DoSomething(IEnumerable<T> source)
{
    int count; // What we do with the source depends on its length

    if (source is ICollection)
    {
        count = source.Count(); // Great, we can use ICollection.Count
    }
    else if (source is IOrderedEnumerable)
    {
        // TODO: Find out whether this is based on an ICollection, 
        // TODO: then determine the count of that ICollection
    }
    else
    {
        // Iterating over the source may be expensive, 
        // to avoid iterating twice, make a copy of the source
        source = source.ToList();
        count = source.Count();
    }

    // do some stuff

}
like image 932
HugoRune Avatar asked Jul 05 '13 16:07

HugoRune


1 Answers

Let's think what this code actually looks like:

var source = new[]{ 2, 4, 6, 1, 9 }.OrderBy(x => x);
int count = source.Count();

It is same as

int count = Enumerable.Count(Enumerable.OrderBy(new[]{ 2, 4, 6, 1, 9 }, x => x));

Result of Enumerable.OrderBy(new[]{ 2, 4, 6, 1, 9 }, x => x) is passed into Count extension. You cannot avoid OrderBy execution. And thus it is non-streaming operator, it consumes all source before returning something, which will be passed to Count.

So, the only way to avoid iterating over all collection, is avoiding OrderBy - count items before sorting.


UPDATE: You can call this extension method on any OrderedEnumerable - it will use reflection to get source field of OrderedEnumerable<T> which holds source sequence. Then check if this sequence is collection, and use Count without executing ordering:

public static class Extensions
{
    public static int Count<T>(this IOrderedEnumerable<T> ordered)
    {
        // you can check if ordered is of type OrderedEnumerable<T>
        Type type = ordered.GetType();
        var flags = BindingFlags.NonPublic | BindingFlags.Instance;
        var field = type.GetField("source", flags);
        var source = field.GetValue(ordered);
        if (source is ICollection<T>)
            return ((ICollection<T>)source).Count;

        return ordered.Count();
    }
}

Usage:

var source = new[]{ 2, 4, 6, 1, 9 }.OrderBy(x => x);
int count = source.Count();
like image 179
Sergey Berezovskiy Avatar answered Sep 30 '22 17:09

Sergey Berezovskiy