I have two lists of ranked items. Each item has an rank and an associated score. The score has decided the rank. The two lists can contains (and usually do) different items, that is their intersection can be empty. I need measures to compare such rankings. Are there well-known algorithms (in literature or real-world systems) to do so ? The measure of distance should take into account the scores as well as the ranks of the items.
Ranked data is data that has been compared to the other pieces of data and given a "place" relative to these other pieces of data. For example, to rank the numbers 7.6, 2.4, 1.5, and 5.9 from least to greatest, 1.5 is first, 2.4 is second, 5.9 is third, and 7.6 is fourth.
The Ranking question asks respondents to compare items to each other by placing them in order of preference. In the Analyze Results section, an average ranking is calculated for each answer choice, allowing you to quickly evaluate the most preferred answer choice.
Rank order data can be summarized and described using mean, median, standard deviation, variance, and frequency.
This question has never been answered before, but I still think it's important to a lot of people out there:
Your two requirements, i.e. non-conjointness of lists and importance of ranks are not met by common correlation tests. In addition to that most of them (Kendall-Tau for example) do not take the order into account:
>>> from scipy.stats import kendalltau
>>> kendalltau([1,2,3,4,5], [2,1,3,4,5])
KendalltauResult(correlation=0.79999999999999982, value=0.050043527347496564)
>>> kendalltau([1,2,3,4,5], [1,2,3,5,4])
KendalltauResult(correlation=0.79999999999999982, value=0.050043527347496564)
The 1st comparison should yield a significantly smaller value than the 2nd one, because the head of the list is more important than the tail (2nd requirement).
In addition to that one can see that both lists need to be the same size and have the same kind of elements (1st requirement)
Possible solution:
The measure that satisfies all your needs is called Rank Biased Overlap. It's a generalization of the so called average based overlap, which is wonderfully illustrated in this blog. The same guy also put out an implementation of RBO.
Update Jan 2018:
Maybe not solving the issue completely, but definitely worth taking a look at Kendall's weighted tau.
It provides a better way of calculating similarity between ranked lists when order matters as i it allows arbitrary weighting based on rank order.
For example, one may be more interested in upweighting similarity in the top 20 items of the list rather than uniformly.
Also has a nice implementation in scipy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With