Quicksort weird time complexity, c++

Tags:

I've been testing the time complexity of different sorting algorithms for different number sequences and it was all going well until i got quicksort's (with pivot in the middle) results for sequences that are one half ascending and the other descending. The graph:

enter image description here

(By "V" I mean a sequence in which the first half is descending, and the other ascending, and by "A" I mean a sequence where the first half is ascending, and the other half is descending.)

Results for other kinds of sequences look as I would expect, but maybe there is something wrong with my algorithm?

void quicksort(int l,int p,int *tab)
{
int i=l,j=p,x=tab[(l+p)/2],w; //x - pivot
do 
{
    while (tab[i]<x)
    {
        i++;
    }
    while (x<tab[j])
    {
        j--;
    }
    if (i<=j)
    {
        w=tab[i];
        tab[i]=tab[j];
        tab[j]=w;
        i++;
        j--;
    }
}
while (i<=j);
if (l<j)
{
    quicksort(l,j,tab);
}
if (i<p)
{
    quicksort(i,p,tab);
}
}

Does anybody have an idea what caused such weird results?

597

asked Apr 05 '16 20:04

vois

2 Answers

TL;DR: The problem is the pivot-choosing strategy, which makes repeatedly poor choices on these types of inputs (A- and V-shaped sequences). These result in quicksort making highly "imbalanced" recursive calls, which in turn result in the algorithm performing very poorly (quadratic time for A-shaped sequences).

Congratulations, you've (re)discovered an adversarial input (or rather a family of inputs) for the version of quicksort that chooses the middle element as the pivot.

For the reference, an example of an A-shaped sequence is 1 2 3 4 3 2 1, i.e., a sequence that increases, reaches the pick at the middle, and then decreases; an example of a V-shaped sequence is 4 3 2 1 2 3 4, i.e., a sequence that decreases, reaches the minimum at the middle, and then increases.

Think about what happens when you pick the middle element as the pivot of an A- or a V-shaped sequence. In the first case, when you pass the algorithm the A-shaped sequence 1 2 ... n-1 n n-1 ... 2 1, the pivot is the largest element of the array---this is because the largest element of an A-shaped sequence is the middle one, and you choose the middle element as the pivot---and you will make recursive calls on subarrays of sizes 0 (your code doesn't actually make the call on 0 elements) and n-1. In the next call on the subarray of size n-1 you will pick as the pivot the largest element of the subarray (which is the second-largest element of the original array); and so on. This results in poor performance because the running time is O(n)+O(n-1)+...+O(1) = O(n^2) because in each step you essentially pass on almost the whole array (all elements except the pivot), in other words, the sizes of the arrays in the recursive calls are highly imbalanced.

Here's the trace for the A-shaped sequence 1 2 3 4 5 4 3 2 1:

blazs@blazs:/tmp$ ./test 
pivot=5
   1   2   3   4   1   4   3   2   5
pivot=4
   1   2   3   2   1   3   4   4
pivot=3
   1   2   3   2   1   3
pivot=3
   1   2   1   2   3
pivot=2
   1   2   1   2
pivot=2
   1   1   2
pivot=1
   1   1
pivot=4
   4   4
   1   1   2   2   3   3   4   4   5

You can see from the trace that at recursive call the algorithm chooses a largest element (there can be up to two largest elements, hence the article a, not the) as the pivot. This means that the running time for the A-shaped sequence really is O(n)+O(n-1)+...+O(1) = O(n^2). (In the technical jargon, the A-shaped sequence is an example of an adversarial input that forces the algorithm to perform poorly.)

This means that if you plot running times for "perfectly" A-shaped sequences of the form

1 2 3 ... n-1 n n-1 ... 3 2 1

for increasing n, you will see a nice quadratic function. Here's a graph I just computed for n=5,105, 205, 305,...,9905 for A-shaped sequences 1 2 ... n-1 n n-1 ... 2 1:

Running times for A-shaped sequences

In the second case, when you pass the algorithm a V-shaped sequence, you choose the smallest element of the array as the pivot, and will thus make recursive calls on subarrays of sizes n-1 and 0 (your code doesn't actually make the call on 0 elements). In the next call on the subarray of size n-1 you will pick as the pivot the largest element; and so on. (But you won't always make such terrible choices; it's hard to say anything more about this case.) This results in poor performance for similar reasons. This case is slightly more complicated (it depends on how you do the "moving" step).

Here's a graph of running times for V-shaped sequences n n-1 ... 2 1 2 ... n-1 n for n=5,105,205,...,49905. The running times are somewhat less regular---as I said it is more complicated because you don't always pick the smallest element as the pivot. The graph:

Running times for V-shaped sequences for increasing sizes.

Code that I used to measure time:

double seconds(size_t n) {
    int *tab = (int *)malloc(sizeof(int) * (2*n - 1));
    size_t i;

    // construct A-shaped sequence 1 2 3 ... n-1 n n-1 ... 3 2 1
    for (i = 0; i < n-1; i++) {
        tab[i] = tab[2*n-i-2] = i+1;
        // To generate V-shaped sequence, use tab[i]=tab[2*n-i-2]=n-i+1;
    }
    tab[n-1] = n;
    // For V-shaped sequence use tab[n-1] = 1;

    clock_t start = clock();
    quicksort(0, 2*n-2, tab);
    clock_t finish = clock();

    free(tab);

    return (double) (finish - start) / CLOCKS_PER_SEC;
}

I adapted your code to print the "trace" of the algorithm so that you can play with it yourself and gain insight into what's going on:

#include <stdio.h>

void print(int *a, size_t l, size_t r);
void quicksort(int l,int p,int *tab);

int main() {
    int tab[] = {1,2,3,4,5,4,3,2,1};
    size_t sz = sizeof(tab) / sizeof(int);

    quicksort(0, sz-1, tab);
    print(tab, 0, sz-1);

    return 0;
}


void print(int *a, size_t l, size_t r) {
    size_t i;
    for (i = l; i <= r; ++i) {
        printf("%4d", a[i]);
    }
    printf("\n");
}

void quicksort(int l,int p,int *tab)
{
int i=l,j=p,x=tab[(l+p)/2],w; //x - pivot
printf("pivot=%d\n", x);
do 
{
    while (tab[i]<x)
    {
        i++;
    }
    while (x<tab[j])
    {
        j--;
    }
    if (i<=j)
    {
        w=tab[i];
        tab[i]=tab[j];
        tab[j]=w;
        i++;
        j--;
    }
}
while (i<=j);

print(tab, l, p);
if (l<j)
{
    quicksort(l,j,tab);
}
if (i<p)
{
    quicksort(i,p,tab);
}
}

By the way, I think the graph showing the running times would be smoother if you took the average of, say, 100 running times for each input sequence.

We see that the problem here is the pivot-choosing strategy. Let me note that you can alleviate the problems with adversarial inputs by randomizing the pivot-choosing step. The simplest approach is to pick the pivot uniformly at random (each element is equally likely to be chosen as the pivot); you can then show that the algorithm runs in O(n log n) time with high probability. (Note, however, that to show this sharp tail bound you need some assumptions on the input; the result certainly holds if the numbers are all distinct; see, for example, Motwani and Raghavan's Randomized Algorithms book.)

To corroborate my claims, here's the graph of running times for the same sequences if you choose the pivot uniformly at random, with x = tab[l + (rand() % (p-l))]; (make sure you call srand(time(NULL)) in the main). For A-shaped sequences: enter image description here

For V-shaped sequences:

enter image description here

answered Oct 02 '22 22:10

blazs

in QuickSort the one of the major things that affect running time is making the input ramdom.

generally choosing a pivot at a particular position may not really be the best except its certain that the input is randomly shuffled. Using the median of three partition is one of the widely used means just to make sure that the pivot is a random number. From your code you didn't implement it.

Also, when recursive quicksort will experience some overhead since its used internal stack( will have to generate several function and assign parameters) , so its advisable that when the size of the data left is around 10 - 20 you can use other sort algorithm like InsertionSort as this will make it about 20% faster.

void quicksort(int l,int p,int *tab){
  if ( tab.size <= 10 ){

      IntersionSort(tab);
   }
 ..
 ..}

Something of this nature.

In General best running time for quicksort is nlogn worse case running time is n^2 often caused by non-random inputs or duplicates inputs

answered Oct 02 '22 20:10

Muyide Ibukun

Related questions
                            
                                Automatic xvalue optimization
                            
                                Smart pointers as map key
                            
                                What the point of using std::ios_base::binary?
                            
                                Overloading operator ->* in C++
                            
                                When both move and copy constructor is present, which one will be called?
                            
                                When std::fprintf(stderr,...) fails, does the operation has no effect or can write a little before fail?
                            
                                Handle argc equal to 0
                            
                                Do vector.emplace_back() and vector.push_back() do the same thing?
                            
                                How to handle readlink() of "/proc/self/exe" when executable is replaced during execution?
                            
                                How to define transitive CXX_STANDARD C++11 in Cmake
                            
                                Why does binding to const reference from a ternary make a copy? [duplicate]
                            
                                Is this code really undefined, as Clang seems to indicate?
                            
                                Explicit call to destructor of template parameter type, even when instantiated on a builtin
                            
                                C++ REST SDK (Casablanca) web::json iteration
                            
                                Mingw x64 Windows: plugin needed to handle lto object
                            
                                Apply using statement to the return type of a function without applying to the entire namespace
                            
                                Structure initialization in C++
                            
                                Mutually-dependent C++ classes, held by std::vector
                            
                                noexcept, inheriting constructors and the invalid use of an incomplete type that is actually complete
                            
                                Generating objects of variable types from combinations of constructor arguments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Quicksort weird time complexity, c++

Tags:

c++

performance

algorithm

time-complexity

quicksort

vois

People also ask

2 Answers

blazs

Muyide Ibukun

Recent Activity

Donate For Us