I am sorting array of integers keys.
stdlib.h qsort;
this is slow, right now my function spends 0.6s on sorting per execution, with stdlib.h qsort it's 1.0s; this has the same performance as std::sort
Timsort;
I tried this: https://github.com/swenson/sort and this: http://code.google.com/p/timsort/source/browse/trunk/timSort.c?spec=svn17&r=17; both were significantly slower than stdlib qsort
http://www.ucw.cz/libucw/ ;
their combination of quick sort and insert sort is the fastest for my data so far; I experimented with various settings and pivot as middle element (not median of 3) and insert sort starting with 28 element sub arrays (not 8 as default) gives the best performance
shell sort;
simple implementation with gaps from this article: http://en.wikipedia.org/wiki/Shellsort; it was decent, although slower than stdlib qsort
My thoughts are that qsort does a lot of swapping around and ruins (ie reverse) sorted subsequences so there should be some way to improve on it by exploiting structure of the data, unfortunately all my tries fail so far.
If you are curious what kind of data is that, those are sets of poker hand evaluated on various boards already sorted on previous board (this is where sorted subsequences come from).
The function is in C. I use Visual Studio 2010. Any ideas ?
Sample data: http://pastebin.com/kKUdnU3N
Sample full execution (1176 sorts): https://dl.dropbox.com/u/86311885/out.zip
You should look into QuickSort or MergeSort if you want faster sorting algorithms. Unlike InsertionSort (and SelectionSort), they are recursive, but still fairly easy to implement. You can find many examples if you look around on the internet.
One way to optimize sort queries on a particular field is to use index sorting and sort the whole index in this field. If the index is sorted by a field, its doc values are also sorted.
Quick sort is the better suited for large data sets. [8]It is the fastest and efficient algorithm for large sets of data. But it is inefficient if the elements in the list are already sorted which results in the worst case time complexity of O(n2).
What if you first do a pass through the array to group the numbers to get rid of duplicates. Each number could go into a hashtable where the number is the key, and the number of times it appears is the value. So if the number 750 000 appears 57 times in the array, the hashtable would hold key=750000; value=57. Then you can sort the much smaller hashtable by keys, which should be less than 100 elements long.
With this you only need to make one pass through the array, and another pass through the much smaller hashtable key list. This should avoid most of the swaps and comparisons.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With