Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does LINQ's OrderBy jive with MoveNext?

Tags:

c#

.net

linq

This thread says that LINQ's OrderBy uses Quicksort. I'm struggling how that makes sense given that OrderBy returns an IEnumerable.

Let's take the following piece of code for example.

int[] arr = new int[] { 1, -1, 0, 60, -1032, 9, 1 }; 
var ordered = arr.OrderBy(i => i); 
foreach(int i in ordered)
    Console.WriteLine(i); 

The loop is the equivalent of

var mover = ordered.GetEnumerator();
while(mover.MoveNext())
    Console.WriteLine(mover.Current);

The MoveNext() returns the next smallest element. The way that LINQ works, unless you "cash out" of the query by use ToList() or similar, there are not supposed to be any intermediate lists created, so each time you call MoveNext() the IEnumerator finds the next smallest element. That doesn't make sense because during the execution of Quicksort there is no concept of a current smallest and next smallest element.

Where is the flaw in my thinking here?

like image 767
user6048670 Avatar asked Dec 14 '22 05:12

user6048670


1 Answers

the way that LINQ works, unless you "cash out" of the query by use ToList() or similar, there are not supposed to be any intermediate lists created

This statement is false. The flaw in your thinking is that you believe a false statement.

The LINQ to Objects implementation is smart about deferring work when possible at a reasonable cost. As you correctly note, it is not possible in the case of sorting. OrderBy produces as its result an object which, when MoveNext is called, enumerates the entire source sequence, generates the sorted list in memory and then enumerates the sorted list.

Similarly, joining and grouping also must enumerate the whole sequence before the first element is enumerated. (Logically, a join is just a cross product with a filter, and the work could be spread out over each MoveNext() but that would be inefficient; for practicality, a lookup table is built. It is educational to work out the asymptotic space vs time tradeoff; give it a shot.)

The source code is available; I encourage you to read it if you have questions about the implementation. Or check out Jon's "edulinq" series.

like image 137
Eric Lippert Avatar answered Dec 31 '22 22:12

Eric Lippert