There was a question asked about how to sort a List. There were several methods given from the basic List.Sort() to List.OrderBy(). The most laughable was a roll-your-own-SelectionSort. I promptly voted that down, but it made me think; wouldn't Linq's OrderBy(), applied to a list, do the same thing? myList.OrderBy(x=>x.Property).ToList() would produce an iterator that basically finds the minimum value of the projection in what's left of the collection and yield returns it. When going through the entire list, that's a selection sort.
Which made me think; what algorithms do the built-in sorters for Lists, SortedLists, Enumerables, etc. use, and by extension, should any of them be avoided for large collections? A SortedList, as it stays sorted by key, would probably use a single-pass InsertionSort on each add; find the first index with a value greater than the new one, and insert before it. Lists and Arrays probably MergeSort themselves pretty efficiently, but I don't know the actual algorithm behind Sort(). We've discussed OrderBy.
What I know above would seem to indicate that List.Sort() or Array.Sort() are the best options for a list of known size, and using Linq to sort an in-memory list or array should be discouraged. For a stream, there really isn't any other way then to OrderBy() the enumerable; the performance loss is mitigated by the fact that you can keep the data as a stream instead of having to have it all before sorting it.
EDIT:
The general consensus is that Sort() is faster given a concrete implementation of a List or Array. OrderBy is reasonable but slower because it adds O(N) complexity of extracting an array from the passed enumerable. SortedList initialization ends up being O(N^2) because of what's under the hood. Moral of the story, use List.Sort() instead of List.OrderBy() when you have an actual List.
Enumerable.OrderBy() slurps the IEnumerable<> into an array and uses quick sort. O(n) storage requirements. It's done by an internal class in System.Core.dll, EnumerableSort<TElement>.QuickSort()
. The storage cost makes it uncompetitive with simply sorting the list, if you have one, since List<> sorts in-place. Linq often optimizes by checking the true capabilities of the IEnumerable with the is operator. Won't work here since List<>.Sort is destructive.
List<>.Sort and Array.Sort use in-place quick sort.
SortedList<> has O(n) complexity for an insertion, dominating the O(log(n)) complexity of finding the insertion point. So putting N unsorted items into it will cost O(n^2). SortedDictionary<> uses a red-black tree, giving insert O(log(n)) complexity. Thus O(nlog(n)) to fill it, same as amortized quick sort.
A quick gander through reflector tells me that List Sort methods utilize quicksort http://en.wikipedia.org/wiki/Quicksort through System.Collections.Generic.GenericArraySortHelper
SortedList uses Array.BinarySearch to figure out where to insert stuff on each Add
Enumerators don't have sorting logic
Quicksort is a good sorting choice for most situations though it can approach O(n^2) if you're really unlucky with the input data.
If you suspect your input data to be a huge pile of data in an unlucky (already sorted) order for quicksort a trick is to randomize the data first (which is always cheap) and then do the sorting on the randomized data. There are a few tricks the quicksort algorithm can implement to mitigate the problem of sorting already sorted (or nearly sorted) input data, I don't know whether the BCL implementation does any of these.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With