Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare ranked lists

I have two lists of ranked items. Each item has an rank and an associated score. The score has decided the rank. The two lists can contains (and usually do) different items, that is their intersection can be empty. I need measures to compare such rankings. Are there well-known algorithms (in literature or real-world systems) to do so ? The measure of distance should take into account the scores as well as the ranks of the items.

like image 471
Valerio Schiavoni Avatar asked Nov 26 '12 22:11

Valerio Schiavoni


People also ask

How do you describe ranked data?

Ranked data is data that has been compared to the other pieces of data and given a "place" relative to these other pieces of data. For example, to rank the numbers 7.6, 2.4, 1.5, and 5.9 from least to greatest, 1.5 is first, 2.4 is second, 5.9 is third, and 7.6 is fourth.

How do you Analyse a ranking question?

The Ranking question asks respondents to compare items to each other by placing them in order of preference. In the Analyze Results section, an average ranking is calculated for each answer choice, allowing you to quickly evaluate the most preferred answer choice.

What measure can be applied to rank ordered data?

Rank order data can be summarized and described using mean, median, standard deviation, variance, and frequency.


2 Answers

This question has never been answered before, but I still think it's important to a lot of people out there:

Your two requirements, i.e. non-conjointness of lists and importance of ranks are not met by common correlation tests. In addition to that most of them (Kendall-Tau for example) do not take the order into account:

>>> from scipy.stats import kendalltau
>>> kendalltau([1,2,3,4,5], [2,1,3,4,5])
KendalltauResult(correlation=0.79999999999999982, value=0.050043527347496564)
>>> kendalltau([1,2,3,4,5], [1,2,3,5,4])
KendalltauResult(correlation=0.79999999999999982, value=0.050043527347496564)

The 1st comparison should yield a significantly smaller value than the 2nd one, because the head of the list is more important than the tail (2nd requirement).

In addition to that one can see that both lists need to be the same size and have the same kind of elements (1st requirement)

Possible solution:

The measure that satisfies all your needs is called Rank Biased Overlap. It's a generalization of the so called average based overlap, which is wonderfully illustrated in this blog. The same guy also put out an implementation of RBO.

Update Jan 2018:

  • Another implementation of RBO for python 3.5.2
like image 181
Mike Dooley Avatar answered Oct 01 '22 19:10

Mike Dooley


Maybe not solving the issue completely, but definitely worth taking a look at Kendall's weighted tau.

It provides a better way of calculating similarity between ranked lists when order matters as i it allows arbitrary weighting based on rank order.

For example, one may be more interested in upweighting similarity in the top 20 items of the list rather than uniformly.

Also has a nice implementation in scipy.

like image 28
alex_sp Avatar answered Oct 01 '22 19:10

alex_sp