Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to measure how sorted a list is?

You can simply count the number of inversions in the list.

Inversion

An inversion in a sequence of elements of type T is a pair of sequence elements that appear out of order according to some ordering < on the set of T's.

From Wikipedia:

Formally, let A(1), A(2), ..., A(n) be a sequence of n numbers.
If i < j and A(i) > A(j), then the pair (i,j) is called an inversion of A.

The inversion number of a sequence is one common measure of its sortedness.
Formally, the inversion number is defined to be the number of inversions, that is,

definition

To make these definitions clearer, consider the example sequence 9, 5, 7, 6. This sequence has the inversions (0,1), (0,2), (0,3), (2,3) and the inversion number 4.

If you want a value between 0 and 1, you can divide the inversion number by N choose 2.

To actually create an algorithm to compute this score for how sorted a list is, you have two approaches:

Approach 1 (Deterministic)

Modify your favorite sorting algorithm to keep track of how many inversions it is correcting as it runs. Though this is nontrivial and has varying implementations depending on the sorting algorithm you choose, you will end up with an algorithm that is no more expensive (in terms of complexity) than the sorting algorithm you started with.

If you take this route, be aware that it's not as simple as counting "swaps." Mergesort, for example, is worst case O(N log N), yet if it is run on a list sorted in descending order, it will correct all N choose 2 inversions. That's O(N^2) inversions corrected in O(N log N) operations. So some operations must inevitably be correcting more than one inversion at a time. You have to be careful with your implementation. Note: you can do this with O(N log N) complexity, it's just tricky.

Related: calculating the number of “inversions” in a permutation

Approach 2 (Stochastic)

  • Randomly sample pairs (i,j), where i != j
  • For each pair, determine whether list[min(i,j)] < list[max(i,j)] (0 or 1)
  • Compute the average of these comparisons and then normalize by N choose 2

I would personally go with the stochastic approach unless you have a requirement of exactness - if only because it's so easy to implement.


If what you really want is a value (z') between -1 (sorted descending) to 1 (sorted ascending), you can simply map the value above (z), which is between 0 (sorted ascending) and 1 (sorted descending), to this range using this formula:

z' = -2 * z + 1

The traditional measure of how sorted a list (or other sequential structure) is, is the number of inversions.

The number of inversions is the number of pairs (a,b) st index of a < b AND b << a. For these purposes << represents whatever ordering relation you choose for your particular sort.

A fully sorted list has no inversions, and a completely reversed list has the maximum number of inversions.


You can use actual correlation.

Suppose that to each item in the sorted list, you assign an integer rank starting from zero. Note that a graph of the elements position index versus rank will look like dots in a straight line (correlation of 1.0 between the position and rank).

You can compute a correlation on this data. For a reverse sort you will get -1 and so on.


There has been great answers, and I would like to add a mathematical aspect for completeness:

  • You can measure how sorted a list is by measuring how much it is correlated to a sorted list. To do that, you may use the rank correlation (the most known being Spearman's), which is exactly the same as the usual correlation, but it uses the rank of elements in a list instead of the analog values of its items.

  • Many extensions exist, like a correlation coefficient (+1 for exact sort, -1 for exact inversion)

  • This allows you to have statistical properties for this measure, like the permutational central limit theorem, which allows you to know the distribution of this measure for random lists.


Apart from inversion count, for numeric lists, mean square distance from the sorted state is imaginable:

#! ruby
d = -> a { a.zip( a.sort ).map { |u, v| ( u - v ) ** 2 }.reduce( :+ ) ** 0.5 }

a = 8, 7, 3, 4, 10, 9, 6, 2, 5, 1
d.( a ) #=> 15.556
d.( a.sort ) #=> 0.0
d.( a.sort.reverse ) # => 18.166 is the worrst case

I am not sure of the "best" method, but a simple one would be to compare every element with the one after it, incrementing a counter if element2 > element 1 (or whatever you want to test) and then divide by the total number of elements. It should give you a percentage.