Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exactly how many comparisons does merge sort make?

I have read that quicksort is much faster than mergesort in practice, and the reason for this is the hidden constant.

Well, the solution for the randomized quick sort complexity is 2nlnn=1.39nlogn which means that the constant in quicksort is 1.39.

But what about mergesort? What is the constant in mergesort?

like image 417
geniaz1 Avatar asked Dec 16 '11 14:12

geniaz1


People also ask

Does a merge sort use comparisons?

In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same in the input and output.

How many comparisons are required to merge 2 sorted files?

This requires at most 3 comparisons: comparing a and b, comparing c and d and comparing the minimum of {a, b} and {c, d}.

How many steps does merge sort take?

The merge sort algorithm is an implementation of the divide and conquer technique. Thus, it gets completed in three steps: 1. Divide: In this step, the array/list divides itself recursively into sub-arrays until the base case is reached.

How many swaps does merge sort make?

The sort answer is that merge sort does not have swaps, or better stated is that it does not need swaps. We can create a temporary array that will be returned as the sum of the lengths of the arrays being merged in order.


2 Answers

Let's see if we can work this out!

In merge sort, at each level of the recursion, we do the following:

  1. Split the array in half.
  2. Recursively sort each half.
  3. Use the merge algorithm to combine the two halves together.

So how many comparisons are done at each step? Well, the divide step doesn't make any comparisons; it just splits the array in half. Step 2 doesn't (directly) make any comparisons; all comparisons are done by recursive calls. In step 3, we have two arrays of size n/2 and need to merge them. This requires at most n comparisons, since each step of the merge algorithm does a comparison and then consumes some array element, so we can't do more than n comparisons.

Combining this together, we get the following recurrence:

C(1) = 0
C(n) = 2C(n / 2) + n

(As mentioned in the comments, the linear term is more precisely (n - 1), though this doesn’t change the overall conclusion. We’ll use the above recurrence as an upper bound.)

To simplify this, let's define n = 2k and rewrite this recurrence in terms of k:

C'(0) = 0
C'(k) = 2C'(k - 1) + 2^k

The first few terms here are 0, 2, 8, 24, ... . This looks something like k 2k, and we can prove this by induction. As our base case, when k = 0, the first term is 0, and the value of k 2k is also 0. For the inductive step, assume the claim holds for some k and consider k + 1. Then the value is 2(k 2k) + 2k + 1 = k 2 k + 1 + 2k + 1 = (k + 1)2k + 1, so the claim holds for k + 1, completing the induction. Thus the value of C'(k) is k 2k. Since n = 2 k, this means that, assuming that n is a perfect power of two, we have that the number of comparisons made is

C(n) = n lg n

Impressively, this is better than quicksort! So why on earth is quicksort faster than merge sort? This has to do with other factors that have nothing to do with the number of comparisons made. Primarily, since quicksort works in place while merge sort works out of place, the locality of reference is not nearly as good in merge sort as it is in quicksort. This is such a huge factor that quicksort ends up being much, much better than merge sort in practice, since the cost of a cache miss is pretty huge. Additionally, the time required to sort an array doesn't just take the number of comparisons into account. Other factors like the number of times each array element is moved can also be important. For example, in merge sort we need to allocate space for the buffered elements, move the elements so that they can be merged, then merge back into the array. These moves aren't counted in our analysis, but they definitely add up. Compare this to quicksort's partitioning step, which moves each array element exactly once and stays within the original array. These extra factors, not the number of comparisons made, dominate the algorithm's runtime.

This analysis is a bit less precise than the optimal one, but Wikipedia confirms that the analysis is roughly n lg n and that this is indeed fewer comparisons than quicksort's average case.

Hope this helps!

like image 69
templatetypedef Avatar answered Nov 04 '22 06:11

templatetypedef


In the worst case and assuming a straight-forward implementation, the number of comparisons to sort n elements is

n ⌈lg n⌉ − 2⌈lg n + 1

where lg n indicates the base-2 logarithm of n.

This result can be found in the corresponding Wikipedia article or recent editions of The Art of Computer Programming by Donald Knuth, and I just wrote down a proof for this answer.

like image 41
MvG Avatar answered Nov 04 '22 08:11

MvG