How can you compare to what extent two lists are in the same order?

Question

I have two arrays containing the same elements, but in different orders, and I want to know the extent to which their orders differ.

The method I tried, didn't work. it was as follows:

For each list I built a matrix which recorded for each pair of elements whether they were above or below each other in the list. I then calculated a pearson correlation coefficient of these two matrices. This worked extremely badly. Here's a trivial example:

list 1:
1
2
3
4

list 2:
1
3
2
4

The method I described above produced matrices like this (where 1 means the row number is higher than the column, and 0 vice-versa):

Since the only difference is the order of elements 2 and 3, these should be deemed to be very similar. The Pearson Correlation Coefficient for those two matrices is 0, suggesting they are not correlated at all. I guess the problem is that what I'm looking for is not really a correlation coefficient, but some other kind of similarity measure. Edit distance, perhaps?

Can anyone suggest anything better?

jamesh · Accepted Answer

Mean square of differences of indices of each element.

List 1: A B C D E
List 2: A D C B E

Indices of each element of List 1 in List 2 (zero based)

A B C D E
0 3 2 1 4

Indices of each element of List 1 in List 1 (zero based)

A B C D E
0 1 2 3 4

Differences:

A  B C D E
0 -2 0 2 0

Square of differences:

A B C D E
  4   4

Average differentness = 8 / 5.

frankodwyer · Answer

Just an idea, but is there any mileage in adapting a standard sort algorithm to count the number of swap operations needed to transform list1 into list2?

I think that defining the compare function may be difficult though (perhaps even just as difficult as the original problem!), and this may be inefficient.

edit: thinking about this a bit more, the compare function would essentially be defined by the target list itself. So for example if list 2 is:

1 4 6 5 3

...then the compare function should result in 1 < 4 < 6 < 5 < 3 (and return equality where entries are equal).

Then the swap function just needs to be extended to count the swap operations.

bubaker · Answer

A bit late for the party here, but just for the record, I think Ben almost had it... if you'd looked further into correlation coefficients, I think you'd have found that Spearman's rank correlation coefficient might have been the way to go.

Interestingly, jamesh seems to have derived a similar measure, but not normalized.

See this recent SO answer.

Interestingly, jamesh seems to have derived a similar measure, but not normalized.

See this recent SO answer.

Dana · Answer

You might consider how many changes it takes to transform one string into another (which I guess it was you were getting at when you mentioned edit distance).

See: http://en.wikipedia.org/wiki/Levenshtein_distance

Although I don't think l-distance takes into account rotation. If you allow rotation as an operation then:

1, 2, 3, 4

and

2, 3, 4, 1

Are pretty similar.

How can you compare to what extent two lists are in the same order?

Tags:

algorithm

Ben

4 Answers

jamesh

frankodwyer

bubaker

Dana

Recent Activity

Donate For Us

How can you compare to what extent two lists are in the same order?

Tags:

algorithm

Ben

4 Answers

jamesh

frankodwyer

bubaker

Dana

Related questions

Recent Activity

Donate For Us