EDIT: Wow, many great responses. Yes, I am using this as a fitness function for judging the quality of a sort performed by a genetic algorithm. So cost-of-evaluation is important (i.e., it has to be fast, preferably O(n)
.)
As part of an AI application I am toying with, I'd like to be able to rate a candidate array of integers based on its monotonicity, aka its "sortedness". At the moment, I'm using a heuristic that calculates the longest sorted run, and then divides that by the length of the array:
public double monotonicity(int[] array) {
if (array.length == 0) return 1d;
int longestRun = longestSortedRun(array);
return (double) longestRun / (double) array.length;
}
public int longestSortedRun(int[] array) {
if (array.length == 0) return 0;
int longestRun = 1;
int currentRun = 1;
for (int i = 1; i < array.length; i++) {
if (array[i] >= array[i - 1]) {
currentRun++;
} else {
currentRun = 1;
}
if (currentRun > longestRun) longestRun = currentRun;
}
return longestRun;
}
This is a good start, but it fails to take into account the possibility that there may be "clumps" of sorted sub-sequences. E.g.:
{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}
This array is partitioned into three sorted sub-sequences. My algorithm will rate it as only 40% sorted, but intuitively, it should get a higher score than that. Is there a standard algorithm for this sort of thing?
This seems like a good candidate for Levenshtein Damerau–Levenshtein distance - the number of swaps needed to sort the array. This should be proportional to how far each item is from where it should be in a sorted array.
Here's a simple ruby algorithm that sums the squares of the distances. It seems a good measure of sortedness - the result gets smaller every time two out-of-order elements are swapped.
ap = a.sort
sum = 0
a.each_index{|i| j = ap.index(a[i])-i
sum += (j*j)
}
dist = sum/(a.size*a.size)
I expect that the choice of function to use depends very strongly on what you intend to use it for. Based on your question, I would guess that you are using a genetic system to create a sorting program, and this is to be the ranking function. If that is the case, then speed of execution is crucial. Based on that, I bet your longest-sorted-subsequence algorithm would work pretty well. That sounds like it should define fitness pretty well.
Something like these? http://en.wikipedia.org/wiki/Rank_correlation
Here's one I just made up.
For each pair of adjacent values, calculate the numeric difference between them. If the second is greater than or equal to the first, add that to the sorted
total, otherwise add to the unsorted
total. When done, take the ratio of the two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With