Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Profiling sorting algorithms against partially sorted data

We know that several sorts, such as insertion sort, are great on arrays that are 'mostly-sorted' and not so great on random data.

Suppose we wanted to profile the performance improvement/degradation of such an algorithm relative to how 'sorted' the input data is. What would be a good way to generate an 'increasingly sorted' or 'increasingly random' array of elements? How might we measure the 'sortedness' of the input?

like image 853
int3 Avatar asked Feb 25 '11 03:02

int3


2 Answers

Number of Inversion is a usual measure of how much sorted an array is.

A pair of elements (pi,pj) in permutation p is called an inversion in a permutation if i<j and pi >pj. For example, in the permutation (3,1,2,5,4) contains the 3 inversions (3,1), (3,2) and (5,4).

A sorted array got 0 inversion and reverse sorted array got n*(n-1)/2.

like image 126
Zimbabao Avatar answered Oct 05 '22 20:10

Zimbabao


You could generate a "partially sorted" dataset by interrupting a modern Fisher-Yates shuffle run on an already ordered dataset.

Also, if you only need a few essentially fixed sets of partially sorted data, then you could generate a column graph of position vs value for each and just eye-ball them. That would let you quickly see the general random-ness of a set, as well things like how much localised order there is.

like image 30
staticsan Avatar answered Oct 05 '22 18:10

staticsan