Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Benefits of Quichesort

I created this program for an assignment in which we were required to create an implementation of Quichesort. This is a hybrid sorting algorithm that uses Quicksort until it reaches a certain recursion depth (log2(N), where N is the length of the list), then switches to Heapsort, to avoid exceeding the maximum recursion depth.

While testing my implementation, I discovered that although it generally performed better than regular Quicksort, Heapsort consistently outperformed both. Can anyone explain why Heapsort performs better, and under what circumstances Quichesort would be better than both Quicksort and Heapsort?

Note that for some reason, the assignment referred to the algorithm as "Quipsort".

Edit: Apparently, "Quichesort" is actually identical to Introsort.

I also noticed that a logic error in my medianOf3() function was causing it to return the wrong value for certain inputs. Here is an improved version of the function:

def medianOf3(lst):
    """
    From a lst of unordered data, find and return the the median value from
    the first, middle and last values.
    """

    first, last = lst[0], lst[-1]
    if len(lst) <= 2:
        return min(first, last)
    middle = lst[(len(lst) - 1) // 2]
    return sorted((first, middle, last))[1]

Would this explain the algorithm's relatively poor performance?

Code for Quichesort:

import heapSort             # heapSort
import math                 # log2 (for quicksort depth limit)

def medianOf3(lst):
    """
    From a lst of unordered data, find and return the the median value from
    the first, middle and last values.
    """

    first, last = lst[0], lst[-1]
    if len(lst) <= 2:
        return min(first, last)
    median = lst[len(lst) // 2]
    return max(min(first, median), min(median, last))

def partition(pivot, lst):
   """
   partition: pivot (element in lst) * List(lst) -> 
        tuple(List(less), List(same, List(more))).  
   Where:
        List(Less) has values less than the pivot
        List(same) has pivot value/s, and
        List(more) has values greater than the pivot

   e.g. partition(5, [11,4,7,2,5,9,3]) == [4,2,3], [5], [11,7,9]
   """

   less, same, more = [], [], []
   for val in lst:
      if val < pivot:
         less.append(val)
      elif val > pivot:
         more.append(val)
      else:
         same.append(val)
   return less, same, more

def quipSortRec(lst, limit):
    """
    A non in-place, depth limited quickSort, using median-of-3 pivot.
    Once the limit drops to 0, it uses heapSort instead.
    """

    if lst == []:
        return []

    if limit == 0:
        return heapSort.heapSort(lst)

    limit -= 1
    pivot = medianOf3(lst)
    less, same, more = partition(pivot, lst)
    return quipSortRec(less, limit) + same + quipSortRec(more, limit)

def quipSort(lst):
    """
    The main routine called to do the sort.  It should call the
    recursive routine with the correct values in order to perform
    the sort
    """

    depthLim = int(math.log2(len(lst)))
    return quipSortRec(lst, depthLim)

Code for Heapsort:

import heapq    # mkHeap (for adding/removing from heap)

def heapSort(lst):
    """
    heapSort(List(Orderable)) -> List(Ordered)
        performs a heapsort on 'lst' returning a new sorted list
    Postcondition: the argument lst is not modified
    """

    heap = list(lst)
    heapq.heapify(heap)
    result = []
    while len(heap) > 0:
        result.append(heapq.heappop(heap))
    return result
like image 565
Slartibartfast Avatar asked Dec 21 '14 18:12

Slartibartfast


People also ask

What is the benefit of sorting algorithm?

Efficient sorting is important for optimizing the efficiency of other algorithms (such as search and merge algorithms) that require input data to be in sorted lists. Sorting is also often useful for canonicalizing data and for producing human-readable output.

Why is Quick sort better than insertion sort?

Quicksort algorithm is efficient if the size of the input is very large. But, insertion sort is more efficient than quick sort in case of small arrays as the number of comparisons and swaps are less compared to quicksort. So we combine the two algorithms to sort efficiently using both approaches.


1 Answers

The basic facts are as follows:

  • Heapsort has worst-case O(n log(n)) performance but tends to be slow in practice.
  • Quicksort has O(n log(n)) performance on average, but O(n^2) in the worst case but is fast in practice.
  • Introsort is intended to harness the fast-in-practice performance of quicksort, while still guaranteeing the worst-case O(n log(n)) behavior of heapsort.

One question to ask is, why is quicksort faster "in practice" than heapsort? This is a tough one to answer, but most answers point to how quicksort has better spatial locality, leading to fewer cache misses. However, I'm not sure how applicable this is to Python, as it is running in an interpreter and has a lot more junk going on under the hood than other languages (e.g. C) that could interfere with cache performance.

As to why your particular introsort implementation is slower than Python's heapsort - again, this is difficult to determine. First of all, note that the heapq module is written in Python, so it's on a relatively even footing with your implementation. It may be that creating and concatenating many smaller lists is costly, so you could try rewriting your quicksort to act in-place and see if that helps. You could also try tweaking various aspects of the implementation to see how that affects performance, or run the code through a profiler and see if there are any hot spots. But in the end I think it's unlikely you'll find a definite answer. It may just boil down to which operations are particularly fast or slow in the Python interpreter.

like image 146
augurar Avatar answered Sep 23 '22 19:09

augurar