Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quicksort weird time complexity, c++

I've been testing the time complexity of different sorting algorithms for different number sequences and it was all going well until i got quicksort's (with pivot in the middle) results for sequences that are one half ascending and the other descending. The graph:

enter image description here

(By "V" I mean a sequence in which the first half is descending, and the other ascending, and by "A" I mean a sequence where the first half is ascending, and the other half is descending.)

Results for other kinds of sequences look as I would expect, but maybe there is something wrong with my algorithm?

void quicksort(int l,int p,int *tab)
{
int i=l,j=p,x=tab[(l+p)/2],w; //x - pivot
do 
{
    while (tab[i]<x)
    {
        i++;
    }
    while (x<tab[j])
    {
        j--;
    }
    if (i<=j)
    {
        w=tab[i];
        tab[i]=tab[j];
        tab[j]=w;
        i++;
        j--;
    }
}
while (i<=j);
if (l<j)
{
    quicksort(l,j,tab);
}
if (i<p)
{
    quicksort(i,p,tab);
}
}

Does anybody have an idea what caused such weird results?

like image 597
vois Avatar asked Apr 05 '16 20:04

vois


People also ask

What is the time complexity of quicksort?

, so in that case quicksort takes O(n2) time.

When Quick sort has worst time complexity?

Quick sort exhibits its worst cast complexity - O(n^2) in this case. More precisely, Quick sort's worst case complexity of O(n^2) is observed when the input to be sorted is in decreasing order or increasing order (if the first elemnet is the pivot element).

How do I avoid worst case in quicksort?

We can avoid the worst-case in Quicksort by choosing an appropriate pivot element. In this section, we'll discuss different ways to choose a pivot element. The first approach for the selection of a pivot element would be to pick it from the middle of the array.

What is the time complexity of qsort in C?

The time complexity of the quicksort in C for various cases is: Best case scenario: This case occurs when the selected pivot is always middle or closest to the middle element of the array. The time complexity for such a scenario is O(n*log n).


2 Answers

TL;DR: The problem is the pivot-choosing strategy, which makes repeatedly poor choices on these types of inputs (A- and V-shaped sequences). These result in quicksort making highly "imbalanced" recursive calls, which in turn result in the algorithm performing very poorly (quadratic time for A-shaped sequences).

Congratulations, you've (re)discovered an adversarial input (or rather a family of inputs) for the version of quicksort that chooses the middle element as the pivot.

For the reference, an example of an A-shaped sequence is 1 2 3 4 3 2 1, i.e., a sequence that increases, reaches the pick at the middle, and then decreases; an example of a V-shaped sequence is 4 3 2 1 2 3 4, i.e., a sequence that decreases, reaches the minimum at the middle, and then increases.

Think about what happens when you pick the middle element as the pivot of an A- or a V-shaped sequence. In the first case, when you pass the algorithm the A-shaped sequence 1 2 ... n-1 n n-1 ... 2 1, the pivot is the largest element of the array---this is because the largest element of an A-shaped sequence is the middle one, and you choose the middle element as the pivot---and you will make recursive calls on subarrays of sizes 0 (your code doesn't actually make the call on 0 elements) and n-1. In the next call on the subarray of size n-1 you will pick as the pivot the largest element of the subarray (which is the second-largest element of the original array); and so on. This results in poor performance because the running time is O(n)+O(n-1)+...+O(1) = O(n^2) because in each step you essentially pass on almost the whole array (all elements except the pivot), in other words, the sizes of the arrays in the recursive calls are highly imbalanced.

Here's the trace for the A-shaped sequence 1 2 3 4 5 4 3 2 1:

blazs@blazs:/tmp$ ./test 
pivot=5
   1   2   3   4   1   4   3   2   5
pivot=4
   1   2   3   2   1   3   4   4
pivot=3
   1   2   3   2   1   3
pivot=3
   1   2   1   2   3
pivot=2
   1   2   1   2
pivot=2
   1   1   2
pivot=1
   1   1
pivot=4
   4   4
   1   1   2   2   3   3   4   4   5

You can see from the trace that at recursive call the algorithm chooses a largest element (there can be up to two largest elements, hence the article a, not the) as the pivot. This means that the running time for the A-shaped sequence really is O(n)+O(n-1)+...+O(1) = O(n^2). (In the technical jargon, the A-shaped sequence is an example of an adversarial input that forces the algorithm to perform poorly.)

This means that if you plot running times for "perfectly" A-shaped sequences of the form

1 2 3 ... n-1 n n-1 ... 3 2 1

for increasing n, you will see a nice quadratic function. Here's a graph I just computed for n=5,105, 205, 305,...,9905 for A-shaped sequences 1 2 ... n-1 n n-1 ... 2 1:

Running times for A-shaped sequences

In the second case, when you pass the algorithm a V-shaped sequence, you choose the smallest element of the array as the pivot, and will thus make recursive calls on subarrays of sizes n-1 and 0 (your code doesn't actually make the call on 0 elements). In the next call on the subarray of size n-1 you will pick as the pivot the largest element; and so on. (But you won't always make such terrible choices; it's hard to say anything more about this case.) This results in poor performance for similar reasons. This case is slightly more complicated (it depends on how you do the "moving" step).

Here's a graph of running times for V-shaped sequences n n-1 ... 2 1 2 ... n-1 n for n=5,105,205,...,49905. The running times are somewhat less regular---as I said it is more complicated because you don't always pick the smallest element as the pivot. The graph:

Running times for V-shaped sequences for increasing sizes.

Code that I used to measure time:

double seconds(size_t n) {
    int *tab = (int *)malloc(sizeof(int) * (2*n - 1));
    size_t i;

    // construct A-shaped sequence 1 2 3 ... n-1 n n-1 ... 3 2 1
    for (i = 0; i < n-1; i++) {
        tab[i] = tab[2*n-i-2] = i+1;
        // To generate V-shaped sequence, use tab[i]=tab[2*n-i-2]=n-i+1;
    }
    tab[n-1] = n;
    // For V-shaped sequence use tab[n-1] = 1;

    clock_t start = clock();
    quicksort(0, 2*n-2, tab);
    clock_t finish = clock();

    free(tab);

    return (double) (finish - start) / CLOCKS_PER_SEC;
}

I adapted your code to print the "trace" of the algorithm so that you can play with it yourself and gain insight into what's going on:

#include <stdio.h>

void print(int *a, size_t l, size_t r);
void quicksort(int l,int p,int *tab);

int main() {
    int tab[] = {1,2,3,4,5,4,3,2,1};
    size_t sz = sizeof(tab) / sizeof(int);

    quicksort(0, sz-1, tab);
    print(tab, 0, sz-1);

    return 0;
}


void print(int *a, size_t l, size_t r) {
    size_t i;
    for (i = l; i <= r; ++i) {
        printf("%4d", a[i]);
    }
    printf("\n");
}

void quicksort(int l,int p,int *tab)
{
int i=l,j=p,x=tab[(l+p)/2],w; //x - pivot
printf("pivot=%d\n", x);
do 
{
    while (tab[i]<x)
    {
        i++;
    }
    while (x<tab[j])
    {
        j--;
    }
    if (i<=j)
    {
        w=tab[i];
        tab[i]=tab[j];
        tab[j]=w;
        i++;
        j--;
    }
}
while (i<=j);

print(tab, l, p);
if (l<j)
{
    quicksort(l,j,tab);
}
if (i<p)
{
    quicksort(i,p,tab);
}
}

By the way, I think the graph showing the running times would be smoother if you took the average of, say, 100 running times for each input sequence.

We see that the problem here is the pivot-choosing strategy. Let me note that you can alleviate the problems with adversarial inputs by randomizing the pivot-choosing step. The simplest approach is to pick the pivot uniformly at random (each element is equally likely to be chosen as the pivot); you can then show that the algorithm runs in O(n log n) time with high probability. (Note, however, that to show this sharp tail bound you need some assumptions on the input; the result certainly holds if the numbers are all distinct; see, for example, Motwani and Raghavan's Randomized Algorithms book.)

To corroborate my claims, here's the graph of running times for the same sequences if you choose the pivot uniformly at random, with x = tab[l + (rand() % (p-l))]; (make sure you call srand(time(NULL)) in the main). For A-shaped sequences: enter image description here

For V-shaped sequences:

enter image description here

like image 54
blazs Avatar answered Oct 02 '22 22:10

blazs


in QuickSort the one of the major things that affect running time is making the input ramdom.

generally choosing a pivot at a particular position may not really be the best except its certain that the input is randomly shuffled. Using the median of three partition is one of the widely used means just to make sure that the pivot is a random number. From your code you didn't implement it.

Also, when recursive quicksort will experience some overhead since its used internal stack( will have to generate several function and assign parameters) , so its advisable that when the size of the data left is around 10 - 20 you can use other sort algorithm like InsertionSort as this will make it about 20% faster.

void quicksort(int l,int p,int *tab){
  if ( tab.size <= 10 ){

      IntersionSort(tab);
   }
 ..
 ..}

Something of this nature.

In General best running time for quicksort is nlogn worse case running time is n^2 often caused by non-random inputs or duplicates inputs

like image 41
Muyide Ibukun Avatar answered Oct 02 '22 20:10

Muyide Ibukun