Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is counting sort a stable sort?

Tags:

Suppose my input is (a,b and c to distinguish between equal keys)

1 6a 8 3 6b 0 6c 4 

My counting sort will save as (discarding the a,b and c info!!)

0(1) 1(1) 3(1) 4(1) 6(3) 8(1) 

which will give me the result

0 1 3 4 6 6 6 8 

So, how is this stable sort? I am not sure how it is "maintaining the relative order of records with equal keys."

Please explain.

like image 490
Lazer Avatar asked Apr 03 '10 18:04

Lazer


People also ask

How do you prove that counting sort is stable?

An important property of counting sort is that it is stable: numbers with the same value appear in the output array in the same order as in the input array. It breaks ties between two numbers by the rule that whichever number appears first in the input array appears first in the output array.

Is counting sort stable and in place?

Is counting sort an in-place sorting algorithm? Answer: No, counting sort is not an in-place sorting algorithm. In in-place sorting algorithms, only a small/constant auxiliary space is used; in counting sort, we use an auxiliary array of the size of the range of data.

Which sorting method is stable?

Several common sorting algorithms are stable by nature, such as Merge Sort, Timsort, Counting Sort, Insertion Sort, and Bubble Sort. Others such as Quicksort, Heapsort and Selection Sort are unstable. We can modify unstable sorting algorithms to be stable.

Is distribution counting algorithm stable?

Recall that distribution counting is a stable sorting algorithm. Since LSD is just a result of running distribution counting across all the digits of the numbers being sorted, LSD radix sort is also a stable sorting algorithm.


2 Answers

To understand why counting sort is stable, you need to understand that counting sort can not only be used for sorting a list of integers, it can also be used for sorting a list of elements whose key is an integer, and these elements will be sorted by their keys while having additional information associated with each of them.

A counting sort example that sorts elements with additional information will help you to understand this. For instance, we want to sort three stocks by their prices:

[(GOOG 3), (CSCO 1), (MSFT 1)] 

Here stock prices are integer keys, and stock names are their associated information.

Expected output for the sorting should be:

[(CSCO 1), (MSFT 1), (GOOG 3)]  (containing both stock price and its name, and the CSCO stock should appear before MSFT so that it is a stable sort) 

A counts array will be calculated for sorting this (let's say stock prices can only be 0 to 3):

counts array: [0, 2, 0, 1] (price "1" appear twice, and price "3" appear once) 

If you are just sorting an integer array, you can go through the counts array and output "1" twice and "3" once and it is done, and the entire counts array will become an all-zero array after this.

But here we want to have stock names in sorting output as well. How can we obtain this additional information (it seems the counts array already discards this piece of information)? Well, the associated information is stored in the original unsorted array. In the unsorted array [(GOOG 3), (CSCO 1), (MSFT 1)], we have both the stock name and its price available. If we get to know which position (GOOG 3) should be in the final sorted array, we can copy this element to the sorted position in the sorted array.

To obtain the final position for each element in the sorted array, unlike sorting an integer array, you don't use the counts array directly to output the sorted elements. Instead, counting sort has an additional step which calculates the cumulative sum array from the counts array:

counts array: [0, 2, 2, 3] (i from 0 to 3: counts[i] = counts[i] + counts[i - 1])  

This cumulative sum array tells us each value's position in the final sorted array currently. For example, counts[1]==2 means currently item with value 1 should be placed in the 2nd slot in the sorted array. Intuitively, because counts[i] is the cumulative sum from left, it shows how many smaller items are before the ith value, which tells you where the position should be for the ith value.

If a $1 price stock appears at the first time, it should be outputted to the second position of the sorted array and if a $3 price stock appears at the first time, it should be outputted to the third position of the sorted array. If a $1 stock appears and its element gets copied to the sorted array, we will decreased its count in the counts array.

counts array: [0, 1, 2, 3]  (so that the second appearance of $1 price stock's position will be 1) 

So we can iterate the unsorted array from backwards (this is important to ensure the stableness), check its position in the sorted array according to the counts array, and copied it to the sorted array.

sorted array: [null, null, null] counts array: [0, 2, 2, 3]      iterate stocks in unsorted stocks from backwards 1. the last stock (MSFT 1) sorted array: [null, (MSFT 1), null] (copy to the second position because counts[1] == 2) counts array: [0, 1, 2, 3] (decrease counts[1] by 1)  2. the middle stock (CSCO 1) sorted array: [(CSCO 1), (MSFT 1), null] (copy to the first position because counts[1] == 1 now) counts array: [0, 0, 2, 3] (decrease counts[1] by 1)  3. the first stock (GOOG 3) sorted array: [(CSCO 1), (MSFT 1), (GOOG 3)] (copy to the third position because counts[3] == 3) counts array: [0, 0, 2, 2] (decrease counts[3] by 1) 

As you can see, after the array gets sorted, the counts array (which is [0, 0, 2, 2]) doesn't become an all-zero array like sorting an array of integers. The counts array is not used to tell how many times an integer appears in the unsorted array, instead, it is used to tell which position the element should be in the final sorted array. And since we decrease the count every time we output an element, we are essentially making the elements with same key's next appearance final position smaller. That's why we need to iterate the unsorted array from backwards to ensure its stableness.

Conclusion:

Since each element contains not only an integer as key, but also some additional information, even if their key is the same, you could tell each element is different by using the additional information, so you will be able to tell if it is a stable sorting algorithm (yes, it is a stable sorting algorithm if implemented appropriately).

References:

Some good materials explaining counting sort and its stableness:

  • http://www.algorithmist.com/index.php/Counting_sort (this article explains this question pretty well)
  • http://courses.csail.mit.edu/6.006/fall11/rec/rec07.pdf
  • http://rosettacode.org/wiki/Sorting_algorithms/Counting_sort (a list of counting sort implementations in different programming languages. If you compare them with the algorithm in wikipedia's entry below about counting sort, you will find most of which doesn't implement the exact counting sort correctly but implement only the integer sorting function and they don't have the additional step to calculate the cumulative sum array. But you could check out the implementation in 'Go' programming language in this link, which does provides two different implementations, one is used for sorting integers only and the other can be used for sorting elements containing additional information)
  • http://en.wikipedia.org/wiki/Counting_sort
like image 193
nybon Avatar answered Sep 17 '22 08:09

nybon


Simple, really: instead of a simple counter for each 'bucket', it's a linked list.

That is, instead of

0(1) 1(1) 3(1) 4(1) 6(3) 8(1) 

You get

0(.) 1(.) 3(.) 4(.) 6(a,b,c) 8(.) 

(here I use . to denote some item in the bucket).

Then just dump them back into one sorted list:

0 1 3 4 6a 6b 6c 8 

That is, when you find an item with key x, knowing that it may have other information that distinguishes it from other items with the same key, you don't just increment a counter for bucket x (which would discard all those extra information).

Instead, you have a linked list (or similarly ordered data structure with constant time amortized append) for each bucket, and you append that item to the end of the list for bucket x as you scan the input left to right.

So instead of using O(k) space for k counters, you have O(k) initially empty lists whose sum of lengths will be n at the end of the "counting" portion of the algorithm. This variant of counting sort will still be O(n + k) as before.

like image 44
polygenelubricants Avatar answered Sep 21 '22 08:09

polygenelubricants