I have been doing my homework which is to compare a bunch of sorting algorithms, and I have came across a strange phenomenon. Things have been as expected: insertionsort winning for something like table of 20 ints, otherwise quicksort outperforming heapsort and mergesort. Up to a table of 500,000 ints (stored in memory). For 5,000,000 ints (still stored in memory) quicksort becomes suddenly worse then heapsort and mergesort. Numbers are always uniformly distributed random, windows virtual memory turned off. Anyone has an idea what could be the cause of that?
void quicksortit(T *tab,int s) {
if (s==0 || s==1) return;
T tmp;
if (s==2) {
if (tab[0]>tab[1]) {
tmp=tab[0];
tab[0]=tab[1];
tab[1]=tmp;
}
return;
}
T pivot=tab[s-1];
T *f1,*f2;
f1=f2=tab;
for(int i=0;i<s;i++)
if (*f2>pivot)
f2++;
else {
tmp=*f1;
*f1=*f2;
*f2=tmp;
f1++; f2++;
}
quicksortit(tab,(f1-1)-tab);
quicksortit(f1,f2-f1);
};
Merge sort can work well on any type of data sets irrespective of its size (either large or small). whereas The quick sort cannot work well with large datasets.
The quick sort cannot work well with large datasets. Additional storage space requirement : Merge sort is not in place because it requires additional memory space to store the auxiliary arrays. The quick sort is in place as it doesn't require any additional storage.
The answer depends on the strategy for choosing pivot. In early versions of Quick Sort where the leftmost (or rightmost) element is chosen as a pivot, the worst occurs in the following cases. 1) Array is already sorted in the same order. 2) Array is already sorted in reverse order.
You algorithm starts failing when there are many duplicates in the array. You only noticed this at large values because you have been feeding the algorithm random values which have a large span
( I'm assuming you used rand() with: 0 - RAND_MAX ), and that problem only appears with large arrays.
When you try to sort an array of identical numbers( try sorting 100000 identical numbers, the program will crash ) you will first walk through the entire array superfluously swapping elements. Then you split the array into two, but the large array has only been reduced by 1:
v
quicksortit(tab,(f1-1)-tab);
Thus your algorithm becomes O(n^2), and you also consume a very large amount of stack. Searching for a better pivot, will not help you in this case, rather choose a version of quicksort() that doesn't exhibit this flaw.
For example:
function quicksort(array)
if length(array) > 1
pivot := select middle, or a median of first, last and middle
left := first index of array
right := last index of array
while left <= right
while array[left] < pivot
left := left + 1
while array[right] > pivot
right := right - 1
if left <= right
swap array[left] with array[right]
left := left + 1
right := right - 1
quicksort(array from first index to right)
quicksort(array from left to last index)
Which is a modified version of: http://rosettacode.org/wiki/Sorting_algorithms/Quicksort
It could be that your array is now bigger than the L3 cache.
Quicksort partitioning operation moves random elements from one end of the array to another. A typical intel L3 cache is like 8MB. With 5M 4-byte elements - your array is 20MB. and you're writing from one end of it to the other.
Cache misses out of L3 go to main memory and can be much slower than higher level cache misses.
That is up until now your entire sorting operation was operating completely inside the CPU.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With