I am reading the definitions of radix, counting and bucket sorts and it seems that all of them are just the code below: <pre class="prettyprint"><code>public static void sort(int[] a, int maxVal){ int [] bucket=new int[maxVal+1]; for (int i=0; i<bucket.length; i++){ bucket[i]=0; } for (int i=0; i<a.length; i++){ bucket[a[i]]++; } int outPos=0; for (int i=0; i<bucket.length; i++){ for (int j=0; j<bucket[i]; j++){ a[outPos++]=i; } } } </code></pre> I know I can't be right, so what am I missing? Show the code if you think that can help explaining in Java or C.

Let's start with some rewriting your code in C, because C more familiar for me to explain. So lets recall your code with some comments: <pre class="prettyprint lang-c prettyprint-override"><code>int counting_sort(int a[], int a_len, int maxVal) { int i, j, outPos = 0; int bucket_len = maxVal+1; int bucket[bucket_len]; /* simple bucket structure */ memset(bucket, 0, sizeof(int) * bucket_len); /* one loop bucket processing */ for (i = 0; i < a_len; i++) { bucket[a[i]]++; /* simple work with buckets */ } for (i=0; i < bucket_len; i++) { for (j = 0; j < bucket[i]; j++) { a[outPos++] = i; } } return 0; } </code></pre> Now lets offer to this guy some realistic data: [126, 348, 343, 432, 316, 171, 556, 223, 670, 201] On output we have [126, 171, 201, 223, 316, 343, 348, 432, 556, 670] It seems that everything is ok? Not yet. Lets look at maxVal. It is 670 (!) To sort array of 10 elements here we used array of 670 elements, primarily zeros. Awfully. To handle this problem of counting sort, we have two possible ways of generalization: 1) First way -- to make sort digit-wise. This is called radix-sort. Lets show some code, trying to make it as close to counting-sort code as possible. Again look at comments: <pre class="prettyprint lang-c prettyprint-override"><code>int radix_sort(int a[], int a_len, int ndigits) { int i; int b[a_len]; int expn = 1; /* additional loop for digits */ for (i = 0; i != ndigits; ++i) { int j; int bucket[10] = {0}; /* still simple buckets */ /* bucket processing becomes tricky */ for (j = 0; j != a_len; ++j) bucket[ a[j] / expn % 10 ]++; for (j = 1; j != 10; ++j) bucket[j] += bucket[j - 1]; for (j = a_len - 1; j >= 0; --j) b[--bucket[a[j] / expn % 10]] = a[j]; for (j = 0; j != a_len; ++j) a[j] = b[j]; expn *= 10; } } </code></pre> We are trading multiplier near N for memory. Profit? Maybe. But in some cases multiplier near N is very important. Program, working a day and working a week are very different from users view even if both works 1*O(N) and 7*O(N) respectively. And so we are coming to a second generalization: 2) Second way -- to make buckets more sophisticated. This is called bucket-sort. Lets again start with some code. I prefer more code prior to philosophical arguments. Still look at comments, they are essential. <pre class="prettyprint lang-c prettyprint-override"><code>int bucket_sort(int a[], int a_len, int maxVal) { int i, aidx; typedef struct tag_list { int elem; struct tag_list *next; } list_t, *list_p; list_p bucket[10] = {0}; /* sophisticated buckets */ /* one loop simple processing with one more inner loop to get sorted buckets (insert-sort on lists, Cormen-style) */ for (i = 0; i != a_len; ++i) { int bnum = (10 * a[i]) / maxVal; list_p bptr = bucket[bnum]; list_p belem = malloc(sizeof(list_t)); belem->elem = a[i]; if (bptr == 0) { bucket[bnum] = belem; belem->next = 0; continue; } else if (a[i] <= bptr->elem) { belem->next = bptr; bucket[bnum] = belem; continue; } else { while (bptr != 0) { if ((bptr->elem <= a[i]) && ((bptr->next == 0) || (bptr->next->elem > a[i]))) { belem->next = bptr->next; bptr->next = belem; break; } bptr = bptr->next; } } } /* one loop (looks as two) to get all back */ aidx = 0; for (i = 0; i != 10; ++i) { list_p bptr = bucket[i]; while (bptr) { list_p optr = bptr; a[aidx] = bptr->elem; aidx += 1; bptr = bptr->next; free(optr); } } return 0; } </code></pre> So what do we have here? We are trading some sophisticated bucket structure and requirement for dynamically allocated memory but winning static memory, and multiplier near N in average. Now lets recall what did we see in code: <ol> <li>Counting sort -- simple buckets, simple processing, memory overhead</li> <li>Radix sort -- simple buckets, sophisticated processing, speed overhead (and still need additional static memory)</li> <li>Bucket sort -- sophisticated buckets, simple processing, requires dynamic memory, good in average</li> </ol> Radix and bucket sorts thus are two useful generalizations of counting sort. They have a lot in common with counting sort and with each other but in every case we are losing something and winning something. Software engineering is about a balance between these opportunities.

According to Geekviewpoint: Radix: http://www.geekviewpoint.com/java/sorting/radixsort <blockquote> Radix sort, like counting sort and bucket sort, is an integer based algorithm (i.e. the values of the input array are assumed to be integers). Hence radix sort is among the fastest sorting algorithms around, in theory. The particular distinction for radix sort is that it creates a bucket for each cipher (i.e. digit); as such, similar to bucket sort, each bucket in radix sort must be a growable list that may admit different keys. </blockquote> Bucket: http://www.geekviewpoint.com/java/sorting/bucketsort <blockquote> Bucket sort is actually very good considering that counting sort is reasonably speaking its upper bound. And counting sort is very fast. The particular distinction for bucket sort is that it uses a hash function to partition the keys of the input array, so that multiple keys may hash to the same bucket. Hence each bucket must effectively be a growable list; similar to radix sort. </blockquote> Counting: http://www.geekviewpoint.com/java/sorting/countingsort <blockquote> The particular distinction for counting sort is that it creates a bucket for each value and keep a counter in each bucket. Then each time a value is encountered in the input collection, the appropriate counter is incremented. Because counting sort creates a bucket for each value, an imposing restriction is that the maximum value in the input array be known beforehand. </blockquote> They explain it in more details on their site. Edit: <ul> <li>If you are using radix sort and your numbers are decimal, then you need 10 buckets, one for each digit from 0 to 9.</li> <li>If you are using counting sort, then you need a bucket for each unique value in the input (actually you need a bucket for each value between 0 and max).</li> <li>If you are using bucketsort, you don't know how many buckets you will be using. Whatever hash function you are using will determine the number of buckets.</li> </ul>

Radix sort vs Counting sort vs Bucket sort. What's the difference?

Tags:

algorithm

sorting

radix-sort

bucket-sort

counting-sort

I am reading the definitions of radix, counting and bucket sorts and it seems that all of them are just the code below:

public static void sort(int[] a, int maxVal){
    int [] bucket=new int[maxVal+1];

    for (int i=0; i<bucket.length; i++){
        bucket[i]=0;
    }

    for (int i=0; i<a.length; i++){
        bucket[a[i]]++;
    }

    int outPos=0;
    for (int i=0; i<bucket.length; i++){
        for (int j=0; j<bucket[i]; j++){
            a[outPos++]=i;
        }
    }
}

I know I can't be right, so what am I missing? Show the code if you think that can help explaining in Java or C.

207

asked Jan 16 '13 21:01

good_evening

3 Answers

Let's start with some rewriting your code in C, because C more familiar for me to explain. So lets recall your code with some comments:

int counting_sort(int a[], int a_len, int maxVal) {   int i, j, outPos = 0;   int bucket_len = maxVal+1;   int bucket[bucket_len]; /* simple bucket structure */    memset(bucket, 0, sizeof(int) * bucket_len);    /* one loop bucket processing */   for (i = 0; i < a_len; i++)     {       bucket[a[i]]++; /* simple work with buckets */     }    for (i=0; i < bucket_len; i++)     {       for (j = 0; j < bucket[i]; j++)         {           a[outPos++] = i;         }     }    return 0; }

Now lets offer to this guy some realistic data:

[126, 348, 343, 432, 316, 171, 556, 223, 670, 201]

On output we have

[126, 171, 201, 223, 316, 343, 348, 432, 556, 670]

It seems that everything is ok? Not yet. Lets look at maxVal. It is 670 (!) To sort array of 10 elements here we used array of 670 elements, primarily zeros. Awfully. To handle this problem of counting sort, we have two possible ways of generalization:

1) First way -- to make sort digit-wise. This is called radix-sort. Lets show some code, trying to make it as close to counting-sort code as possible. Again look at comments:

int radix_sort(int a[], int a_len, int ndigits) {   int i;   int b[a_len];   int expn = 1;    /* additional loop for digits */   for (i = 0; i != ndigits; ++i)     {       int j;       int bucket[10] = {0}; /* still simple buckets */        /* bucket processing becomes tricky */       for (j = 0; j != a_len; ++j)         bucket[ a[j] / expn % 10 ]++;        for (j = 1; j != 10; ++j)         bucket[j] += bucket[j - 1];        for (j = a_len - 1; j >= 0; --j)         b[--bucket[a[j] / expn % 10]] = a[j];        for (j = 0; j != a_len; ++j)         a[j] = b[j];        expn *= 10;     } }

We are trading multiplier near N for memory. Profit? Maybe. But in some cases multiplier near N is very important. Program, working a day and working a week are very different from users view even if both works 1*O(N) and 7*O(N) respectively. And so we are coming to a second generalization:

2) Second way -- to make buckets more sophisticated. This is called bucket-sort.

Lets again start with some code. I prefer more code prior to philosophical arguments. Still look at comments, they are essential.

int bucket_sort(int a[], int a_len, int maxVal) {   int i, aidx;    typedef struct tag_list {     int elem;     struct tag_list *next;   } list_t, *list_p;    list_p bucket[10] = {0}; /* sophisticated buckets */    /* one loop simple processing with one more inner loop      to get sorted buckets (insert-sort on lists, Cormen-style) */   for (i = 0; i != a_len; ++i)     {       int bnum = (10 * a[i]) / maxVal;       list_p bptr = bucket[bnum];       list_p belem = malloc(sizeof(list_t));       belem->elem = a[i];       if (bptr == 0)         {           bucket[bnum] = belem;           belem->next = 0;           continue;         }       else if (a[i] <= bptr->elem)         {           belem->next = bptr;           bucket[bnum] = belem;           continue;         }       else         {           while (bptr != 0)             {               if ((bptr->elem <= a[i]) && ((bptr->next == 0) || (bptr->next->elem > a[i])))                 {                   belem->next = bptr->next;                   bptr->next = belem;                   break;                 }                bptr = bptr->next;             }          }     }    /* one loop (looks as two) to get all back */   aidx = 0;    for (i = 0; i != 10; ++i)     {       list_p bptr = bucket[i];       while (bptr)         {           list_p optr = bptr;           a[aidx] = bptr->elem;           aidx += 1;           bptr = bptr->next;           free(optr);         }     }    return 0; }

So what do we have here? We are trading some sophisticated bucket structure and requirement for dynamically allocated memory but winning static memory, and multiplier near N in average.

Now lets recall what did we see in code:

Counting sort -- simple buckets, simple processing, memory overhead
Radix sort -- simple buckets, sophisticated processing, speed overhead (and still need additional static memory)
Bucket sort -- sophisticated buckets, simple processing, requires dynamic memory, good in average

Radix and bucket sorts thus are two useful generalizations of counting sort. They have a lot in common with counting sort and with each other but in every case we are losing something and winning something. Software engineering is about a balance between these opportunities.

172

answered Sep 23 '22 08:09

Konstantin Vladimirov

Radix sort vs Counting sort vs Bucket sort. What's the difference?

Bucket sort places the keys or elements to be sorted into buckets. How they are placed in buckets is arbitrary and can be portions of a composite key and any distribution you like. The individual buckets may need to sort further.

Sorting in memory is faster than sort on disk. However if you have more data than will fit in memory you need another option. What you can do is a bucket sort, where the buckets are small enough to fit into memory. i.e. there is a large number of entries in each bucket. These you can quick sort individually.

Radix sort is a specific type of bucket sort. It starts with the top n-bit or n-digits and may sort those buckets using a radix sort etc., until every entry is sorted.

Counting sort is like using radix sort except you are using the whole value. Instead of recording each object, it has a bucket for each object and it just counts the number of occurrences. This works well when you have a limited number of possible keys and you have many duplicates.

answered Sep 19 '22 08:09

Peter Lawrey

According to Geekviewpoint:

Radix: http://www.geekviewpoint.com/java/sorting/radixsort

Radix sort, like counting sort and bucket sort, is an integer based algorithm (i.e. the values of the input array are assumed to be integers). Hence radix sort is among the fastest sorting algorithms around, in theory. The particular distinction for radix sort is that it creates a bucket for each cipher (i.e. digit); as such, similar to bucket sort, each bucket in radix sort must be a growable list that may admit different keys.

Bucket: http://www.geekviewpoint.com/java/sorting/bucketsort

Bucket sort is actually very good considering that counting sort is reasonably speaking its upper bound. And counting sort is very fast. The particular distinction for bucket sort is that it uses a hash function to partition the keys of the input array, so that multiple keys may hash to the same bucket. Hence each bucket must effectively be a growable list; similar to radix sort.

Counting: http://www.geekviewpoint.com/java/sorting/countingsort

The particular distinction for counting sort is that it creates a bucket for each value and keep a counter in each bucket. Then each time a value is encountered in the input collection, the appropriate counter is incremented. Because counting sort creates a bucket for each value, an imposing restriction is that the maximum value in the input array be known beforehand.

They explain it in more details on their site.

Edit:

If you are using radix sort and your numbers are decimal, then you need 10 buckets, one for each digit from 0 to 9.
If you are using counting sort, then you need a bucket for each unique value in the input (actually you need a bucket for each value between 0 and max).
If you are using bucketsort, you don't know how many buckets you will be using. Whatever hash function you are using will determine the number of buckets.

answered Sep 20 '22 08:09

kasavbere

Related questions
                            
                                Good algorithm and data structure for looking up words with missing letters?
                            
                                Which is better: O(n log n) or O(n^2)
                            
                                How to find the only number in an array that doesn't occur twice [duplicate]
                            
                                What is the right approach when using STL container for median calculation?
                            
                                Use of for_each on map elements
                            
                                Bellman-Ford vs Dijkstra: Under what circumstances is Bellman-Ford better?
                            
                                Higher-order unification
                            
                                What is holding genetic programming back?
                            
                                Fast way to generate pseudo-random bits with a given probability of 0 or 1 for each bit
                            
                                Reason for the number 5381 in the DJB hash function?
                            
                                Finding cartesian product with PHP associative arrays
                            
                                is a "non-decreasing" sequence "increasing"?
                            
                                Differences between time complexity and space complexity?
                            
                                LogLog and HyperLogLog algorithms for counting of large cardinalities
                            
                                Problem: Bob's Sale
                            
                                Faster algorithm to find unique element between two arrays?
                            
                                How to implement O(logn) decrease-key operation for min-heap based Priority Queue?
                            
                                What's the Time Complexity of Average Regex algorithms?
                            
                                What is the fastest way to flatten arbitrarily nested lists in Python? [duplicate]
                            
                                Peak detection of measured signal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Radix sort vs Counting sort vs Bucket sort. What's the difference?

Tags:

algorithm

sorting

radix-sort

bucket-sort

counting-sort

good_evening

People also ask

3 Answers

Konstantin Vladimirov

Peter Lawrey

kasavbere

Recent Activity

Donate For Us