Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does System.Linq.Enumerable.Reverse copy all elements internally to an array?

Some years back, somebody complained about the implementation of Linq.Reverse() and Microsoft promised to fix that. This was in 2008, so the question is, does Framework 4 have an optimized implementation of Linq.Reverse() that does not materialize the collection (i.e. copy all elements to an internal array) when the collection type allows it (e.g. IList<T>)?

like image 388
Diego Avatar asked Feb 18 '12 00:02

Diego


People also ask

What is Linq enumerable?

The most basic enumerable is an array, which can store a fixed number of typed elements. int[] numbers = new int[3] { 1, 2, 3}; In the example above, we create an integer array that can hold 3 values and initialize the array with the values 1, 2 and 3. We assign that array to a variable of type int[] named numbers .

What is Tolist C#?

This extension method converts collections (IEnumerables) to List instances. It is fast and easy-to-remember. It returns a List instance with the appropriate elements.


2 Answers

Obviously it's not possible to optimize all cases. If some object implements only IEnumerable<T> and not IList<T>, you have to iterate it until the end to find the last element. So the optimization would be only for types that implement IList<T> (like T[] or List<T>).

Now, is it actually optimized in .Net 4.5 DP? Let's fire up Reflector ILSpy:

public static IEnumerable<TSource> Reverse<TSource>(
    this IEnumerable<TSource> source)
{
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    return ReverseIterator<TSource>(source);
}

Okay, how does ReverseIterator<TSource>() look?

private static IEnumerable<TSource> ReverseIterator<TSource>(
    IEnumerable<TSource> source)
{
    Buffer<TSource> buffer = new Buffer<TSource>(source);
    for (int i = buffer.count - 1; i >= 0; i--)
    {
        yield return buffer.items[i];
    }
    yield break;
}

What that iterator block does is to create a Buffer<T> for the collection and iterate backwards through that. We're almost there, what's Buffer<T>?

[StructLayout(LayoutKind.Sequential)]
internal struct Buffer<TElement>
{
    internal TElement[] items;
    internal int count;
    internal Buffer(IEnumerable<TElement> source)
    {
        TElement[] array = null;
        int length = 0;
        ICollection<TElement> is2 = source as ICollection<TElement>;
        if (is2 != null)
        {
            length = is2.Count;
            if (length > 0)
            {
                array = new TElement[length];
                is2.CopyTo(array, 0);
            }
        }
        else
        {
            foreach (TElement local in source)
            {
                if (array == null)
                {
                    array = new TElement[4];
                }
                else if (array.Length == length)
                {
                    TElement[] destinationArray = new TElement[length * 2];
                    Array.Copy(array, 0, destinationArray, 0, length);
                    array = destinationArray;
                }
                array[length] = local;
                length++;
            }
        }
        this.items = array;
        this.count = length;
    }

    // one more member omitted
}

What have we here? We copy the content to an array. In every case. The only optimization is that if we know Count (that is, the collection implements ICollection<T>), we don't have to reallocate the array.

So, the optimization for IList<T> is not in .Net 4.5 DP. It creates a copy of the whole collection in every case.

If I were to guess why it isn't optimized, after reading Jon Skeet's article on this issue, I think it's because that optimization is observable. If you mutate the collection while iterating, you would see the changed data with the optimization, but the old data without it. And optimizations that actually change behavior of something in subtle ways are a bad thing, because of backwards compatibility.

like image 60
svick Avatar answered Oct 20 '22 04:10

svick


EDIT: Yes, it appears that this change has been made

The bug report you linked to marks the bug as Fixed, but I wanted to make sure for myself. So, I wrote this little program:

static void Main(string[] args)
{
    List<int> bigList = Enumerable.Range(0, 100000000).ToList();

    Console.WriteLine("List allocated");
    Console.ReadKey();

    foreach (int n in bigList.Reverse<int>())
    {
        // This will never be true, but the loop ensures that we enumerate
        // through the return value of Reverse()
        if (n > 100000000)
            Console.WriteLine("{0}", n);
    }
}

The idea is that the program allocates 400 MB of space into bigList, then waits for the user to press a key, then calls Enumerable.Reverse(bigList) via extension method syntax.

I tested this program with a Debug build on a Windows 7 x64 machine. My memory usage before starting the program is exactly 2.00 GB, according to Task Manager. Then, before I hit a key, memory usage reaches 2.63 GB. After I hit the key, memory usage briefly spikes to 2.75 GB. Importantly, though, it does not spike by 400 MB or more, which would be the case if Enumerable.Reverse() were making a copy.

ORIGINAL POST

It's impossible for a correct Enumerable.Reverse() implementation not to copy to an array or other data structure in some situations.

The complaint you link to deals only with IList<T>. In the general case, though, I argue that Enumerable.Reverse() must copy the elements to some internal buffer.

Consider the following method

private int x = 0;

public IEnumerable<int> Foo()
{
    for (int n = 0; n < 1000; n++)
    {
        yield return n;
        x++;
    }
}

Now let's say that Enumerable.Reverse() didn't copy the input IEnumerable<T> to a buffer in this case. Then, the loop

foreach (int n in Foo().Reverse())
    Console.WriteLine("{0}", n);

would iterate all the way through the iterator block to get the first n, all the way through the first 999 elements to get the second n, and so on. But this wouldn't have the same effect on x as forward iteration, because we would be mutating x every time we iterated almost all the way through the return value of Foo(). In order to prevent this disconnect between forward and reverse iteration, the Enumerable.Reverse() method must make a copy of the input IEnumerable<T>.

like image 26
Adam Mihalcin Avatar answered Oct 20 '22 03:10

Adam Mihalcin