Just for fun I built a quicksort implementation in C# with Linq:
public static IEnumerable<T> quicksort<T>(IEnumerable<T> input) where T : IComparable<T>{
if (input.Count() <= 1) return input;
var pivot = input.FirstOrDefault();
var lesser = quicksort(input.Skip(1).Where(i => i.CompareTo(pivot) <= 0));
var greater = quicksort(input.Where(i => i.CompareTo(pivot) > 0));
return lesser.Append(pivot).Concat(greater);
}
It sorts 10000 random integers in about 13 seconds.
Changing it to use int[] instead of List results in about 700 times better performance! It only takes 21ms to sort the same 10000 random integers.
public static T[] quicksortArray<T>(T[] input) where T : IComparable<T>{
if (input.Count() <= 1) return input;
var pivot = input.FirstOrDefault();
var lesser = quicksortArray(input.Skip(1).Where(i => i.CompareTo(pivot) <= 0).ToArray());
var greater = quicksortArray(input.Where(i => i.CompareTo(pivot) > 0).ToArray());
return lesser.Append(pivot).Concat(greater).ToArray();
}
Just looking at the code I would have assumed this would have worse performance. I assumed that .ToArray() would create an additional array in memory and copy all integers there. I think passing an array vs. passing a list should take about the same time.
So where does this huge performance difference come from?
This is why you should be very careful about iterating an IEnumerable
multiple times.
The first time you call quicksort
you pass in, say, List
. It calls quicksort
two more times, in each of those cases, the IEnumerable
that you're passing in represents a query that will skip the first item and then perform a comparison on every single item after that. You then take that query and pass it to two more instances of quicksort
, but making the query not only skip the first item and compare every item after it, but then skip the first item of the results of that query and then compare every item after that to something. This means that by the time you finally reach a value, you've got a query that represents log(n) skips, and is comparing every item in the sequence to a value log(n) times.
In your second implementation you're not passing in queries to quicksort
, you're evaluating those queries to their values and passing in the values to the operation, which can then use those values twice, rather than continually performing an extremely complex query multiple times.
The thing about linq queries are they are lazy, they won't be evaluated until you call a method like ToArray
or ToList
. In your first code for example, this query:
input.Skip(1).Where(i => i.CompareTo(pivot))
Will be evaluated at least twice every time you call quicksort
, once for Count()
and once for FirstOrDefault
. Which means the filtering operation will be repeated for each call over and over. When you use ToArray
on the other hand, since you have the filtered elements in an array already, Where
won't be executed every time, it will be executed once you call the ToArray
and that's it. This is the difference between the codes, based on this you can do the math for the other parts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With