Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

quick sort algorithm improvement if more duplicate keys

I am reading quick sort algorithm in book by Robert Sedwick Algorithms and data structures part 1-4.

template <class item>

static void quicksort(item [] a, int l, int r)
{
    if(r <= l) return;
    int i = partition(a,l,r);
    quicksort(a, l, i-1);
    quicksort(a, i+1, r);
}

template <class item>
int partition(item a[], int l, int r)
{
    int i = l-1; j = r; item v = a[r];

    for(;;) {
        while( a[++i] < v );
        while( v < a[--j] ) 
            if( j == l ) 
                break;
        if( i >= j) 
            break;  // pointer crossing.
        exch(a[i], a[j]);
    }

    exch(a[i], a[r]);
    return i;
}

Book has following text on above algorithm.

When duplicate keys are present in the file, the pointer crossing is subtle. we could improve the partitioning process slightly by terminating the scans when i < j, and then using j, rather than i-1, to delimit the right end of the left subfile for the first recursive call. Letting the loop iterate one more time in this case is an improvement, because, when ever the scanning loops terminate with j and i referring to the same element, we end up with two elements in their final positions: the element that stopped both scans, which must therefore be equal to the partitioning element, and the partitioning element itself. This change is probably worth making, because, in this particular case, the program leaves a record with a key equal to the partitioning key in a[r], and that makes the first partition in the call quick-sort(a, i+1, r) degenerate, because its right most key is its smallest.

My questions are

  1. How can we modify above program with description below? I am having difficulty in modifying it to understand text.
  2. Why above quick sort algorithm does not work effectively if more duplicate keys are present.
  3. How above modification improve if more duplication keys are present?
  4. What does author mean by "he first partition in the call quick-sort(a, i+1, r) degenerate, because its right most key is its smallest." ? What does author mean by degenerate here?

Thanks for your time and help.

like image 798
venkysmarty Avatar asked Nov 12 '12 06:11

venkysmarty


People also ask

Does Quick sort work with duplicates?

When the list of items to be sorted contains a lot of duplicate values, we can improve QuickSort by grouping all the values that are equal to the pivot to the middle and then we recursively QuickSort those values on the left and those values on the right.

How can you improve the performance of the quicksort algorithm?

Instead of picking the first element as pivot every time, if we pick a random element from the unexplored array and swap it with the first element and then perform the partitioning procedure (any one of the above two), then it will improve the expected or average time complexity to O(N*log N).

When implementing quick sort if the array contains lots of duplicates?

When implementing quicksort, if the array contains lots of duplicates, it may be better to perform a three-way partition (into elements less than, equal to, and greater than the pivot), to make smaller recursive calls. Assume three-way comparisons, as provided by the compareTo method.


1 Answers

>>Why above quick sort algorithm does not work effectively if more duplicate keys are present?

It becomes inefficient because your breaking condition is: if(i >= j) break;
so, as you scan from both sides with i and j, It is quite possible that you break when i == j instead of letting i surpass over j.

What bad could possibly happen when we break on i==j when many duplicate keys are present ?

When you break for i==j; from first while loop you must have had a[i] >= v and from second while loop a[j] <=v but since we are considering a 'break' for: i==j so, a[i] = a[j] = v i.e. a[i] is same as v, your pivot element.

In such a scenario, your outermost exch(a[i], a[r]); will simply exchange pivot value to itself.
Hence, in your next recursive call quicksort(a, i+1, r); for Right-half of the array, you would have minimum element sitting at the rightmost end.( your pivot choosing strategy is simply, item v = a[r]; ) and we all know it is bad for QuickSort to choose a pivot element which amounts to the minimum or the maximum of the array. Hence, your subsequent recursive call for right-half will be a degenerate one.
That is why author is advising not to break for i==j but catch them just before that happens.

>>What does author mean by degenerate here?

Degenerate here means, the recursion tree is getting skewed i.e. the subsequent problems are not being generated of nearly equal sizes. You are dividing a problem of size N into something like problems of size N-1 and 1 instead of something more balanced, like dividing it into problems of size N/2 and N/2.

>>How can we modify above program with description below?

We could implement it like following:

int partition(int A[], int l, int r){
        int i=l-1, j=r, v = A[r];
        for(;;){
                while(A[++i] < v);
                while(A[--j] > v)
                        if(j == l)
                                break;
                if(i>=j)
                        break;
                swap(A[i], A[j]);
        }
        if(i == j){// case when we stopped at the pivot element.
                j = j+1;//backtrack j 1 step.
                if(j <= r)
                    swap(A[j], A[r]);
                return j;// partition the subsequent problems around j now.
        }
        swap(A[i], A[r]);
        return i;
}

>>How above modification improve if more duplication keys are present?
It improves the performance by letting you NOT generate an obvious scenario of a degenerate case.

like image 142
srbhkmr Avatar answered Sep 30 '22 19:09

srbhkmr