Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Short Circuiting sort

I understand that:

head (map (2**) [1..999999])

Will only actually evaluate 2**1, and none of the rest, but the book I am reading says that:

head (sort somelist)

Will only need to find the smallest item in the list, because that is all that is used. How does this work? As far as I can tell, this would be impossible with the sorting algorithms I know (like bubble sorting).

The only way I can think that this would work is if the sorting algorithm were to go through the entire list looking for the smallest item, and then recurse on the list without that item. To me, this sounds really slow.

Is this how the sort function works, or is there another sorting algorithm I don't know about, that would allow for short circuiting like it is?

like image 294
Jeffrey Aylesworth Avatar asked Dec 01 '09 21:12

Jeffrey Aylesworth


1 Answers

This:

Will only need to find the smallest item in the list, because that is all that is used.

... should really say that the function only needs to do the minimal amount of work that the sorting algorithm requires to find the smallest element.

For example, if we are using quicksort as our underlying sorting algorithm, then head . quicksort is equivalent to the optimal (!) selection algorithm known as 'quickselect', which is worst-case linear. Moreover, we can implement k-quickselect merely by take k . quicksort.

Wikipedia notes in its article upon selection algorithms that (my emphasis):

Because language support for sorting is more ubiquitous, the simplistic approach of sorting followed by indexing is preferred in many environments despite its disadvantage in speed. Indeed, for lazy languages, this simplistic approach can even get you the best complexity possible for the k smallest/greatest sorted (with maximum/minimum as a special case) if your sort is lazy enough.

Quicksort works well in this scenario, whereas the default sort in Haskell (merge sort) doesn't compose quite as well, as it does more work than strictly necessary to return each element of the sorted list. As this post on the Haskell mailing list notes:

lazy quicksort is able to produce the batch of the first k smallest elements in

O(n + k log k) total time [1]

while lazy mergesort needs

O(n + k log n) total time [2]

For more you might like to read this blog post.

like image 88
porges Avatar answered Oct 21 '22 13:10

porges