Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quicksort linear time?

I'm doing an analysis for the quicksort (qsort from c++ STL) algorithm, the code is:

#include <iostream>
#include <fstream>
#include <ctime>
#include <bits/stdc++.h>
#include <cstdlib>
#include <iomanip>

#define MIN_ARRAY 256000
#define MAX_ARRAY 1000000000
#define MAX_RUNS 100

using namespace std;

int* random_array(int size) {
    int* array = new int[size];

    for (int c = 0; c < size; c++) {
        array[c] = rand()*rand() % 1000000;
    }

    return array;
}

int compare(const void* a, const void* b) { 
    return (*(int*)a - *(int*)b); 
}

int main()
{
    ofstream fout;
    fout.open("data.csv");
    fout << "array size,";
    srand(time(NULL));
    int size;
    int counter = 1;

    std::clock_t start;
    double duration;

    for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
        fout << size << ",";
    }
    fout << "\n";

    for (counter = 1; counter <= MAX_RUNS; counter++) {
        fout << "run " << counter << ",";
        for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
            try {
                int* arr = random_array(size);

                start = std::clock();
                qsort(arr, size, sizeof(int), compare);
                duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;

                //cout << "size: " << size << " duration: " << duration << '\n';
                fout << setprecision(15) << duration << ",";

                delete[] arr;
            }
            catch (bad_alloc) {
                cout << "bad alloc caught, size: " << size << "\n";
                fout << "bad alloc,";
            }

        }
        fout << "\n";
        cout << counter << "% done\n";
    }
    
    fout.close();
    return 0;
}

when I run this, the data comes back perfectly linear:

data

what on earth is going on? Isnt quicksort O(nlogn)?

Here's the array sizes used and the average time in seconds for each size for all 100 runs:

arraysize,256000,512000,1024000,2048000,4096000,8192000,16384000,32768000,65536000,131072000,262144000,524288000
average,0.034,0.066,0.132,0.266,0.534,1.048,2.047,4.023,7.951,15.833,31.442
like image 380
pickle Avatar asked Mar 16 '21 17:03

pickle


People also ask

Is quicksort linear time?

The real work of quicksort happens during the divide step, which partitions subarray array[p.. r] around a pivot drawn from the subarray.

Can we do sorting in linear time?

since you have a range of integers, then yes, it can be linear, but this won't always be the case. This is also known as "bucket sort".

What is the time complexity of quicksort?

Time Complexity The best-case time complexity of quicksort is O(n*logn). Average Case Complexity - It occurs when the array elements are in jumbled order that is not properly ascending and not properly descending. The average case time complexity of quicksort is O(n*logn).

Why is quicksort O N 2?

The worst case time complexity of a typical implementation of QuickSort is O(n2). The worst case occurs when the picked pivot is always an extreme (smallest or largest) element. This happens when input array is sorted or reverse sorted and either first or last element is picked as pivot.


2 Answers

It is, on average, indeed O(N log N).

It's just that a graph of f(N) = N log(N) appears remarkably linear.

Graph it and see for yourself, or refer to the one below. This average time is what makes the algorithm so very clever:

enter image description here

like image 124
Bathsheba Avatar answered Sep 30 '22 18:09

Bathsheba


The reason the slope looks linear is partly the slow change in Log(N) but the main reason is that the random numbers that populate the array are limited to [0-1,000,000). This causes the large arrays to mostly be filled with duplicates and as the qsort algorithm gets down to smaller groups, sorting becomes much faster. As the array size increases from, say 10,000,000 to 20,000,000 the duplicates double in count on average and hence the sorting tracks almost entirely linearly.

This can be seen from the following graph:

Sort time of unconstrained and constrained int arrays

The orange and grey lines are the execution times of the unconstrained and constrained arrays. The yellow and blue lines are linear from 0 to the end points of two runs. One run has ints constrained to [0-1000000) from the original code. The other isunconstrained to the 2^31 positive ints. Note how much longer the unconstrained sort takes since sorting of the increasing groups of duplicates is extremely fast.

Here's the code modification that shows the unconstrained execution time has a distinct curve as one would expect from NLogN.

#include <iostream>
#include <fstream>
#include <ctime>
#include <cstdlib>
#include <iomanip>

#define MIN_ARRAY 256000
#define MAX_ARRAY 1000000000
#define MAX_RUNS 100

using namespace std;

int* random_array(int size) {
    int* array = new int[size];

    for (int c = 0; c < size; c++) {
        // array[c] = rand() * rand() % 1000000;
            // Note that as the array size grows beyond 1000000
            // this will produce increasing numbers of duplicates
            // which will shorten the time when the subsets get small
    
        array[c] = (rand() << 16) | (rand() << 1) | (rand() & 1);
            // Note that in this example/system, RAND_MAX==0x7fff
            // get a random positive int distributed in the set of positive, 32 bit ints
    }

    return array;
}

int compare(const void* a, const void* b) {
    return (*(int*)a - *(int*)b);
}

int main()
{
    auto x = RAND_MAX;
    ofstream fout;
    fout.open("data.csv");
    fout << "array size,";
    srand(time(NULL));
    int size;
    int counter = 1;

    std::clock_t start;
    double duration;

    for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
        fout << size << ",";
    }
    fout << "\n";

    for (counter = 1; counter <= MAX_RUNS; counter++) {
        fout << "run " << counter << ",";
        for (size = MIN_ARRAY; size < MAX_ARRAY; size *= 2) {
            try {
                int* arr = random_array(size);

                start = std::clock();
                qsort(arr, size, sizeof(int), compare);
                duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;

                cout << "size: " << size << " duration: " << duration << '\n';
                fout << setprecision(15) << duration << ",";

                delete[] arr;
            }
            catch (bad_alloc) {
                cout << "bad alloc caught, size: " << size << "\n";
                fout << "bad alloc,";
            }

        }
        fout << "\n";
        cout << counter << "% done\n";
    }

    fout.close();
    return 0;
}
like image 45
doug Avatar answered Sep 30 '22 16:09

doug