Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List<T> and IEnumerable difference

While implementing this generic merge sort, as a kind of Code Kata, I stumbled on a difference between IEnumerable and List that I need help to figure out.

Here's the MergeSort

public class MergeSort<T>
{
    public IEnumerable<T> Sort(IEnumerable<T> arr)
    {
        if (arr.Count() <= 1) return arr;

        int middle = arr.Count() / 2;
        var left = arr.Take(middle).ToList();
        var right = arr.Skip(middle).ToList();
        return Merge(Sort(left), Sort(right));
    }

    private static IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right)
    {
        var arrSorted = new List<T>();

        while (left.Count() > 0 && right.Count() > 0)
        {
            if (Comparer<T>.Default.Compare(left.First(), right.First()) < 0)
            {
                arrSorted.Add(left.First());
                left=left.Skip(1);
            }
            else
            {
                arrSorted.Add(right.First());  
                right=right.Skip(1);  
            }
        }

        return arrSorted.Concat(left).Concat(right);
    }
}

If I remove the .ToList() on the left and right variables it fails to sort correctly. Do you see why?

Example

var ints = new List<int> { 5, 8, 2, 1, 7 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);

With .ToList()

    [0]: 1
    [1]: 2
    [2]: 5
    [3]: 7
    [4]: 8

Without .ToList()

    [0]: 1
    [1]: 2
    [2]: 5
    [3]: 7
    [4]: 2

Edit

It was my stupid test that got me.

I tested it like this:

var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");

just changing the first row to

var sortedInts = mergeSortInt.Sort(ints).ToList();

removes the problem (and the lazy evaluation).

EDIT 2010-12-29

I thought I would figure out just how the lazy evaluation messes things up here but I just don't get it.

Remove the .ToList() in the Sort method above like this

var left = arr.Take(middle);
var right = arr.Skip(middle);

then try this

var ints = new List<int> { 5, 8, 2 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");

When debugging You can see that before ints.Sort() a sortedInts.ToList() returns

[0]: 2
[1]: 5
[2]: 8

but after ints.Sort() it returns

[0]: 2
[1]: 5
[2]: 5

What is really happening here?

like image 951
Jonas Elfström Avatar asked Dec 28 '10 09:12

Jonas Elfström


1 Answers

Your function is correct - if you inspect the result of Merge, you'll see the result is sorted (example).
So where's the problem? Just as you've suspected, you're testing it wrong - when you call Sort on your original list you change all collections that derive from it!
Here's a snippet that demonstrates what you did:

List<int> numbers = new List<int> {5, 4};
IEnumerable<int> first = numbers.Take(1);
Console.WriteLine(first.Single()); //prints 5
numbers.Sort();
Console.WriteLine(first.Single()); //prints 4!

All collections you create are basically the same as first - in a way, they are lazy pointers to positions in ints. Obviously, when you call ToList, the problem is eliminated.

Your case is more complex than that. Your Sort is part lazy, exactly as you suggest: First you create a list (arrSorted) and add integers to it. That part isn't lazy, and is the reason you see the first few elements sorted. Next, you add the remaining elements - but Concat is lazy. Now, recursion enters to mess this even more: In most cases, most elements on your IEnumerable are eager - you create lists out of left and right, which are also made of mostly eager + lazy tail. You end up with a sorted List<int>, lazily concated to a lazy pointer, which should be just the last element (other elements were merged before).
Here's a call graph of your functions - red indicated a lazy collection, black a real number:

alt text

When you change the list the new list is mostly intact, but the last element is lazy, and point to the position of the largest element in the original list.

The result is mostly good, but its last element still points to the original list:

alt text

One last example: consider you're changing all elements in the original list. As you can see, most elements in the sorted collection remain the same, but the last is lazy and points to the new value:

var ints = new List<int> { 3,2,1 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
// sortedInts is { 1, 2, 3 }
for(int i=0;i<ints.Count;i++) ints[i] = -i * 10;
// sortedInts is { 1, 2, 0 }

Here's the same example on Ideone: http://ideone.com/FQVR7

like image 179
Kobi Avatar answered Oct 20 '22 22:10

Kobi