Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

strangely slow quicksort for large tables

I have been doing my homework which is to compare a bunch of sorting algorithms, and I have came across a strange phenomenon. Things have been as expected: insertionsort winning for something like table of 20 ints, otherwise quicksort outperforming heapsort and mergesort. Up to a table of 500,000 ints (stored in memory). For 5,000,000 ints (still stored in memory) quicksort becomes suddenly worse then heapsort and mergesort. Numbers are always uniformly distributed random, windows virtual memory turned off. Anyone has an idea what could be the cause of that?

     void quicksortit(T *tab,int s) {
                   if (s==0 || s==1) return;
                   T tmp;
                   if (s==2) {
                      if (tab[0]>tab[1]) {
                                         tmp=tab[0];
                                         tab[0]=tab[1];
                                         tab[1]=tmp;
                                         }
                      return;
                      }
                   T pivot=tab[s-1];
                   T *f1,*f2;
                   f1=f2=tab;
                   for(int i=0;i<s;i++)
                           if (*f2>pivot)
                              f2++;
                           else {
                                tmp=*f1;
                                *f1=*f2;
                                *f2=tmp;
                                f1++; f2++;
                                }
                   quicksortit(tab,(f1-1)-tab);
                   quicksortit(f1,f2-f1);
     };
like image 317
Domin Avatar asked Nov 03 '14 20:11

Domin


People also ask

Is quick sort good for large data?

Merge sort can work well on any type of data sets irrespective of its size (either large or small). whereas The quick sort cannot work well with large datasets.

Why is quick sort bad for large arrays?

The quick sort cannot work well with large datasets. Additional storage space requirement : Merge sort is not in place because it requires additional memory space to store the auxiliary arrays. The quick sort is in place as it doesn't require any additional storage.

In which case quicksort becomes a slow sort?

The answer depends on the strategy for choosing pivot. In early versions of Quick Sort where the leftmost (or rightmost) element is chosen as a pivot, the worst occurs in the following cases. 1) Array is already sorted in the same order. 2) Array is already sorted in reverse order.


2 Answers

You algorithm starts failing when there are many duplicates in the array. You only noticed this at large values because you have been feeding the algorithm random values which have a large span
( I'm assuming you used rand() with: 0 - RAND_MAX ), and that problem only appears with large arrays.

When you try to sort an array of identical numbers( try sorting 100000 identical numbers, the program will crash ) you will first walk through the entire array superfluously swapping elements. Then you split the array into two, but the large array has only been reduced by 1:

                    v
quicksortit(tab,(f1-1)-tab);

Thus your algorithm becomes O(n^2), and you also consume a very large amount of stack. Searching for a better pivot, will not help you in this case, rather choose a version of quicksort() that doesn't exhibit this flaw.

For example:

function quicksort(array)
    if length(array) > 1
        pivot := select middle, or a median of first, last and middle
        left := first index of array
        right := last index of array
        while left <= right
            while array[left] < pivot
                left := left + 1
            while array[right] > pivot
                right := right - 1
            if left <= right
                swap array[left] with array[right]
                left := left + 1
                right := right - 1
        quicksort(array from first index to right)
        quicksort(array from left to last index)

Which is a modified version of: http://rosettacode.org/wiki/Sorting_algorithms/Quicksort

like image 60
2501 Avatar answered Sep 18 '22 00:09

2501


It could be that your array is now bigger than the L3 cache.

Quicksort partitioning operation moves random elements from one end of the array to another. A typical intel L3 cache is like 8MB. With 5M 4-byte elements - your array is 20MB. and you're writing from one end of it to the other.

Cache misses out of L3 go to main memory and can be much slower than higher level cache misses.

That is up until now your entire sorting operation was operating completely inside the CPU.

like image 25
Rafael Baptista Avatar answered Sep 18 '22 00:09

Rafael Baptista