While implementing this generic merge sort, as a kind of Code Kata, I stumbled on a difference between IEnumerable and List that I need help to figure out.
Here's the MergeSort
public class MergeSort<T>
{
public IEnumerable<T> Sort(IEnumerable<T> arr)
{
if (arr.Count() <= 1) return arr;
int middle = arr.Count() / 2;
var left = arr.Take(middle).ToList();
var right = arr.Skip(middle).ToList();
return Merge(Sort(left), Sort(right));
}
private static IEnumerable<T> Merge(IEnumerable<T> left, IEnumerable<T> right)
{
var arrSorted = new List<T>();
while (left.Count() > 0 && right.Count() > 0)
{
if (Comparer<T>.Default.Compare(left.First(), right.First()) < 0)
{
arrSorted.Add(left.First());
left=left.Skip(1);
}
else
{
arrSorted.Add(right.First());
right=right.Skip(1);
}
}
return arrSorted.Concat(left).Concat(right);
}
}
If I remove the .ToList()
on the left
and right
variables it fails to sort correctly. Do you see why?
Example
var ints = new List<int> { 5, 8, 2, 1, 7 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
With .ToList()
[0]: 1 [1]: 2 [2]: 5 [3]: 7 [4]: 8
Without .ToList()
[0]: 1 [1]: 2 [2]: 5 [3]: 7 [4]: 2
Edit
It was my stupid test that got me.
I tested it like this:
var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");
just changing the first row to
var sortedInts = mergeSortInt.Sort(ints).ToList();
removes the problem (and the lazy evaluation).
EDIT 2010-12-29
I thought I would figure out just how the lazy evaluation messes things up here but I just don't get it.
Remove the .ToList()
in the Sort method above like this
var left = arr.Take(middle);
var right = arr.Skip(middle);
then try this
var ints = new List<int> { 5, 8, 2 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
ints.Sort();
if (Enumerable.SequenceEqual(ints, sortedInts)) Console.WriteLine("ints sorts ok");
When debugging You can see that before ints.Sort()
a sortedInts.ToList()
returns
[0]: 2
[1]: 5
[2]: 8
but after ints.Sort()
it returns
[0]: 2
[1]: 5
[2]: 5
What is really happening here?
Your function is correct - if you inspect the result of Merge
, you'll see the result is sorted (example).
So where's the problem? Just as you've suspected, you're testing it wrong - when you call Sort
on your original list you change all collections that derive from it!
Here's a snippet that demonstrates what you did:
List<int> numbers = new List<int> {5, 4};
IEnumerable<int> first = numbers.Take(1);
Console.WriteLine(first.Single()); //prints 5
numbers.Sort();
Console.WriteLine(first.Single()); //prints 4!
All collections you create are basically the same as first
- in a way, they are lazy pointers to positions in ints
. Obviously, when you call ToList
, the problem is eliminated.
Your case is more complex than that. Your Sort
is part lazy, exactly as you suggest: First you create a list (arrSorted
) and add integers to it. That part isn't lazy, and is the reason you see the first few elements sorted. Next, you add the remaining elements - but Concat
is lazy. Now, recursion enters to mess this even more: In most cases, most elements on your IEnumerable
are eager - you create lists out of left and right, which are also made of mostly eager + lazy tail. You end up with a sorted List<int>
, lazily concated to a lazy pointer, which should be just the last element (other elements were merged before).
Here's a call graph of your functions - red indicated a lazy collection, black a real number:
When you change the list the new list is mostly intact, but the last element is lazy, and point to the position of the largest element in the original list.
The result is mostly good, but its last element still points to the original list:
One last example: consider you're changing all elements in the original list. As you can see, most elements in the sorted collection remain the same, but the last is lazy and points to the new value:
var ints = new List<int> { 3,2,1 };
var mergeSortInt = new MergeSort<int>();
var sortedInts = mergeSortInt.Sort(ints);
// sortedInts is { 1, 2, 3 }
for(int i=0;i<ints.Count;i++) ints[i] = -i * 10;
// sortedInts is { 1, 2, 0 }
Here's the same example on Ideone: http://ideone.com/FQVR7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With