Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the constant factor of quicksort better than that of heapsort?

According to my calculation:

  • Quicksort cost = n + (n/2 + n/2) + (n/4 + n/4 + n/4 + n/4) + ... = n * log(n) = log(nn)
  • Heapsort cost = sum [log(i)] for i = n, n-1, n-2, ..., 1 = log(n!)

Why it is said quicksort has better constant factor than heapsort and therefore quick sort is better than heapsort in average? Isn't log(nn) > log(n!)?

like image 638
user389955 Avatar asked Aug 26 '13 18:08

user389955


2 Answers

I think the issue here is that your analysis of quicksort and heapsort is not precise enough to show why the constant factors would be different.

You can indeed show that on average, quicksort will do more comparisons than heapsort (roughly 1.44 n log2 n for quicksort versus n log2 n versus heapsort). However, comparisons are not the only determining factor in the runtime of heapsort and quicksort.

The main reason quicksort is faster is locality of reference. Due to the way that memory caches work, array accesses in locations that are adjacent to one another tend to be much, much faster than array accesses scattered throughout an array. In quicksort, the partitioning step typically does all its reads and writes at the ends of the arrays, so the array accesses are closely packed toward one another. Heapsort, on the other hand, jumps around the array as it moves up and down the heap. Therefore, the array accesses in quicksort, on average, take much less time than the array accesses in heapsort. The difference is large enough that the constant factor in front of the n log n term in quicksort is lower than the constant factor in front of the n log n term in heapsort, which is one reason why quicksort is much faster than heapsort.

In short - if all we care about are comparisons, heapsort is a better choice than quicksort. But since memory systems use caches and cache misses are expensive, quicksort is usually a much better option.

Also, note that log(nn) = n log n and log (n!) = n log n - n + O(log n) via Stirling's approximation. This means that log (n!) is not much smaller than n log n, even as n gets very large. There definitely is a difference, but it's not large enough to make a huge dent on its own.

Hope this helps!

like image 67
templatetypedef Avatar answered Oct 30 '22 23:10

templatetypedef


Here are paragraphs from Steven S. Skiena's The Algorithm Design Manual, which talking about the speed between the three O(nlogn) sorting algortihms:

But how can we compare two Θ(n log n) algorithms to decide which is faster?How can we prove that quicksort is really quick? Unfortunately, the RAM model and Big Oh analysis provide too coarse a set of tools to make that type of distinction. When faced with algorithms of the same asymptotic complexity, implementation details and system quirks such as cache performance and memory size may well prove to be the decisive factor.

What we can say is that experiments show that where a properly implemented quicksort is implemented well, it is typically 2-3 times faster than mergesort or heapsort. The primary reason is that the operations in the innermost loop are simpler. But I can’t argue with you if you don’t believe me when I say quicksort is faster. It is a question whose solution lies outside the analytical tools we are using. The best way to tell is to implement both algorithms and experiment.

-4.6.3 "Is Quicksort Really Quick?",The Algorithm Design Manual

like image 28
vvy Avatar answered Oct 30 '22 22:10

vvy