C: Sorting Methods Analysis

Question

I have alot of different sorting algorithms which all have the following signature:

void <METHOD>_sort_ints(int * array, const unsigned int ARRAY_LENGTH);

Are there any testing suites for sorting which I could use for the purpose of making empirical comparisons?

ire_and_curses · Accepted Answer

This detailed discussion, as well as linking to a large number of related web pages you are likely to find useful, also describes a useful set of input data for testing sorting algorithms (see the linked page for reasons). Summarising:

Completely randomly reshuffled array
Already sorted array
Already sorted in reverse order array
Chainsaw array
Array of identical elements
Already sorted array with N permutations (with N from 0.1 to 10% of the size)
Already sorted array in reverse order array with N permutations
Data that have normal distribution with duplicate (or close) keys (for stable sorting only)
Pseudorandom data (daily values of S&P500 or other index for a decade might be a good test set here; they are available from Yahoo.com )

Norman Ramsey · Answer

The definitive study of sorting is Bob Sedgewick's doctoral dissertation. But there's a lot of good information in his algorithms textbooks, and those are the first two places I would look for test suite and methodology. If you've had a recent course you'll know more than I do; last time I had a course, the best method was to use quicksort down to partitions of size 12, then run insertion sort on the entire array. But the answers change as quickly as the hardware.

Jon Bentley's Programming Perls books have some other info on sorting.

You can quickly whip up a test suite containing

Random integers
Sorted integers
Reverse sorted integers
Sorted integers, mildly perturbed

If memory serves, these are the most important cases for a sort algorithm.

If you're looking to sort arrays that won't fit in cache, you'll need to measure cache effects. valgrind is effective if slow.

Alex Martelli · Answer

sortperf.py has a well selected suite of benchmark test cases and was used to support the essay found here and make timsort THE sort in Python lo that many years ago. Note that, at long last, Java may be moving to timsort too, thanks to Josh Block (see here), so I imagine they have written their own version of the benchmark test cases -- however, I can't easily find a reference to it. (timsort, a stable, adaptive, iterative natural mergesort variant, is especially suited to languages with reference-to-object semantics like Python and Java, where "data movement" is relatively cheap [[since all that's ever being moved is references aka pointers, not blobs of unbounded size;-)]], but comparisons can be relatively costly [[since there is no upper bound to the complexity of a comparison function -- but then this holds for any language where sorting may be customized via a custom comparison or key-extraction function]]).

Dror Helper · Answer

This site shows the various sorting algorithms using four groups: http://www.sorting-algorithms.com/

In addition to the four group in Norman's answer you want to check the sorting algorithms with collection of numbers that have a few similarities in the numbers:

All integers are unique
The same integer in the whole collection
Few Unique Keys

Changing the number of elements in the collection is also a good practice check each algorithm with 1K, 1M, 1G etc. to see what are the memory implications of that algorithm.

C: Sorting Methods Analysis

Tags:

c

sorting

analysis

Ande Turner

4 Answers

ire_and_curses

Norman Ramsey

Alex Martelli

Dror Helper

Recent Activity

Donate For Us

C: Sorting Methods Analysis

Tags:

c

sorting

analysis

Ande Turner

4 Answers

ire_and_curses

Norman Ramsey

Alex Martelli

Dror Helper

Related questions

Recent Activity

Donate For Us