I created this program for an assignment in which we were required to create an implementation of Quichesort. This is a hybrid sorting algorithm that uses Quicksort until it reaches a certain recursion depth (log2(N), where N is the length of the list), then switches to Heapsort, to avoid exceeding the maximum recursion depth.
While testing my implementation, I discovered that although it generally performed better than regular Quicksort, Heapsort consistently outperformed both. Can anyone explain why Heapsort performs better, and under what circumstances Quichesort would be better than both Quicksort and Heapsort?
Note that for some reason, the assignment referred to the algorithm as "Quipsort".
Edit: Apparently, "Quichesort" is actually identical to Introsort.
I also noticed that a logic error in my medianOf3()
function was
causing it to return the wrong value for certain inputs. Here is an improved
version of the function:
def medianOf3(lst):
"""
From a lst of unordered data, find and return the the median value from
the first, middle and last values.
"""
first, last = lst[0], lst[-1]
if len(lst) <= 2:
return min(first, last)
middle = lst[(len(lst) - 1) // 2]
return sorted((first, middle, last))[1]
Would this explain the algorithm's relatively poor performance?
import heapSort # heapSort
import math # log2 (for quicksort depth limit)
def medianOf3(lst):
"""
From a lst of unordered data, find and return the the median value from
the first, middle and last values.
"""
first, last = lst[0], lst[-1]
if len(lst) <= 2:
return min(first, last)
median = lst[len(lst) // 2]
return max(min(first, median), min(median, last))
def partition(pivot, lst):
"""
partition: pivot (element in lst) * List(lst) ->
tuple(List(less), List(same, List(more))).
Where:
List(Less) has values less than the pivot
List(same) has pivot value/s, and
List(more) has values greater than the pivot
e.g. partition(5, [11,4,7,2,5,9,3]) == [4,2,3], [5], [11,7,9]
"""
less, same, more = [], [], []
for val in lst:
if val < pivot:
less.append(val)
elif val > pivot:
more.append(val)
else:
same.append(val)
return less, same, more
def quipSortRec(lst, limit):
"""
A non in-place, depth limited quickSort, using median-of-3 pivot.
Once the limit drops to 0, it uses heapSort instead.
"""
if lst == []:
return []
if limit == 0:
return heapSort.heapSort(lst)
limit -= 1
pivot = medianOf3(lst)
less, same, more = partition(pivot, lst)
return quipSortRec(less, limit) + same + quipSortRec(more, limit)
def quipSort(lst):
"""
The main routine called to do the sort. It should call the
recursive routine with the correct values in order to perform
the sort
"""
depthLim = int(math.log2(len(lst)))
return quipSortRec(lst, depthLim)
import heapq # mkHeap (for adding/removing from heap)
def heapSort(lst):
"""
heapSort(List(Orderable)) -> List(Ordered)
performs a heapsort on 'lst' returning a new sorted list
Postcondition: the argument lst is not modified
"""
heap = list(lst)
heapq.heapify(heap)
result = []
while len(heap) > 0:
result.append(heapq.heappop(heap))
return result
Efficient sorting is important for optimizing the efficiency of other algorithms (such as search and merge algorithms) that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.
Quicksort algorithm is efficient if the size of the input is very large. But, insertion sort is more efficient than quick sort in case of small arrays as the number of comparisons and swaps are less compared to quicksort. So we combine the two algorithms to sort efficiently using both approaches.
The basic facts are as follows:
One question to ask is, why is quicksort faster "in practice" than heapsort? This is a tough one to answer, but most answers point to how quicksort has better spatial locality, leading to fewer cache misses. However, I'm not sure how applicable this is to Python, as it is running in an interpreter and has a lot more junk going on under the hood than other languages (e.g. C) that could interfere with cache performance.
As to why your particular introsort implementation is slower than Python's heapsort - again, this is difficult to determine. First of all, note that the heapq module is written in Python, so it's on a relatively even footing with your implementation. It may be that creating and concatenating many smaller lists is costly, so you could try rewriting your quicksort to act in-place and see if that helps. You could also try tweaking various aspects of the implementation to see how that affects performance, or run the code through a profiler and see if there are any hot spots. But in the end I think it's unlikely you'll find a definite answer. It may just boil down to which operations are particularly fast or slow in the Python interpreter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With