Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you compare to what extent two lists are in the same order?

Tags:

algorithm

I have two arrays containing the same elements, but in different orders, and I want to know the extent to which their orders differ.

The method I tried, didn't work. it was as follows:

For each list I built a matrix which recorded for each pair of elements whether they were above or below each other in the list. I then calculated a pearson correlation coefficient of these two matrices. This worked extremely badly. Here's a trivial example:

list 1:
1
2
3
4

list 2:
1
3
2
4

The method I described above produced matrices like this (where 1 means the row number is higher than the column, and 0 vice-versa):

list 1:
  1 2 3 4
1   1 1 1
2     1 1
3       1
4

list 2:
  1 2 3 4 
1   1 1 1
2     0 1 
3       1
4

Since the only difference is the order of elements 2 and 3, these should be deemed to be very similar. The Pearson Correlation Coefficient for those two matrices is 0, suggesting they are not correlated at all. I guess the problem is that what I'm looking for is not really a correlation coefficient, but some other kind of similarity measure. Edit distance, perhaps?

Can anyone suggest anything better?

like image 466
Ben Avatar asked Dec 18 '08 15:12

Ben


4 Answers

Mean square of differences of indices of each element.

List 1: A B C D E
List 2: A D C B E

Indices of each element of List 1 in List 2 (zero based)

A B C D E
0 3 2 1 4

Indices of each element of List 1 in List 1 (zero based)

A B C D E
0 1 2 3 4

Differences:

A  B C D E
0 -2 0 2 0

Square of differences:

A B C D E
  4   4

Average differentness = 8 / 5.

like image 78
jamesh Avatar answered Nov 13 '22 11:11

jamesh


Just an idea, but is there any mileage in adapting a standard sort algorithm to count the number of swap operations needed to transform list1 into list2?

I think that defining the compare function may be difficult though (perhaps even just as difficult as the original problem!), and this may be inefficient.

edit: thinking about this a bit more, the compare function would essentially be defined by the target list itself. So for example if list 2 is:

1 4 6 5 3

...then the compare function should result in 1 < 4 < 6 < 5 < 3 (and return equality where entries are equal).

Then the swap function just needs to be extended to count the swap operations.

like image 20
frankodwyer Avatar answered Nov 13 '22 11:11

frankodwyer


A bit late for the party here, but just for the record, I think Ben almost had it... if you'd looked further into correlation coefficients, I think you'd have found that Spearman's rank correlation coefficient might have been the way to go.

Interestingly, jamesh seems to have derived a similar measure, but not normalized.

See this recent SO answer.

like image 40
bubaker Avatar answered Nov 13 '22 10:11

bubaker


You might consider how many changes it takes to transform one string into another (which I guess it was you were getting at when you mentioned edit distance).

See: http://en.wikipedia.org/wiki/Levenshtein_distance

Although I don't think l-distance takes into account rotation. If you allow rotation as an operation then:

1, 2, 3, 4

and

2, 3, 4, 1

Are pretty similar.

like image 23
Dana Avatar answered Nov 13 '22 11:11

Dana