Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C: Sorting Methods Analysis

I have alot of different sorting algorithms which all have the following signature:

void <METHOD>_sort_ints(int * array, const unsigned int ARRAY_LENGTH);

Are there any testing suites for sorting which I could use for the purpose of making empirical comparisons?

like image 496
Ande Turner Avatar asked Aug 27 '09 03:08

Ande Turner


4 Answers

This detailed discussion, as well as linking to a large number of related web pages you are likely to find useful, also describes a useful set of input data for testing sorting algorithms (see the linked page for reasons). Summarising:

  1. Completely randomly reshuffled array
  2. Already sorted array
  3. Already sorted in reverse order array
  4. Chainsaw array
  5. Array of identical elements
  6. Already sorted array with N permutations (with N from 0.1 to 10% of the size)
  7. Already sorted array in reverse order array with N permutations
  8. Data that have normal distribution with duplicate (or close) keys (for stable sorting only)
  9. Pseudorandom data (daily values of S&P500 or other index for a decade might be a good test set here; they are available from Yahoo.com )
like image 139
ire_and_curses Avatar answered Sep 24 '22 04:09

ire_and_curses


The definitive study of sorting is Bob Sedgewick's doctoral dissertation. But there's a lot of good information in his algorithms textbooks, and those are the first two places I would look for test suite and methodology. If you've had a recent course you'll know more than I do; last time I had a course, the best method was to use quicksort down to partitions of size 12, then run insertion sort on the entire array. But the answers change as quickly as the hardware.

Jon Bentley's Programming Perls books have some other info on sorting.

You can quickly whip up a test suite containing

  • Random integers

  • Sorted integers

  • Reverse sorted integers

  • Sorted integers, mildly perturbed

If memory serves, these are the most important cases for a sort algorithm.

If you're looking to sort arrays that won't fit in cache, you'll need to measure cache effects. valgrind is effective if slow.

like image 20
Norman Ramsey Avatar answered Sep 26 '22 04:09

Norman Ramsey


sortperf.py has a well selected suite of benchmark test cases and was used to support the essay found here and make timsort THE sort in Python lo that many years ago. Note that, at long last, Java may be moving to timsort too, thanks to Josh Block (see here), so I imagine they have written their own version of the benchmark test cases -- however, I can't easily find a reference to it. (timsort, a stable, adaptive, iterative natural mergesort variant, is especially suited to languages with reference-to-object semantics like Python and Java, where "data movement" is relatively cheap [[since all that's ever being moved is references aka pointers, not blobs of unbounded size;-)]], but comparisons can be relatively costly [[since there is no upper bound to the complexity of a comparison function -- but then this holds for any language where sorting may be customized via a custom comparison or key-extraction function]]).

like image 26
Alex Martelli Avatar answered Sep 24 '22 04:09

Alex Martelli


This site shows the various sorting algorithms using four groups: http://www.sorting-algorithms.com/

In addition to the four group in Norman's answer you want to check the sorting algorithms with collection of numbers that have a few similarities in the numbers:

  • All integers are unique
  • The same integer in the whole collection
  • Few Unique Keys

Changing the number of elements in the collection is also a good practice check each algorithm with 1K, 1M, 1G etc. to see what are the memory implications of that algorithm.

like image 43
Dror Helper Avatar answered Sep 25 '22 04:09

Dror Helper