Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spearman rank correlation in Python with ties

I want to compute the spearman rank correlation using Python and most likely scipy implementation (scipy.stats.spearmanr).

The data at hand looks e.g., the following way (dictionaries):

{a:0.3, b:0.2, c:0.2} and {a:0.5, b:0.6, c:0.4}

To now pass it over to the spearman module, I would assign them ranks, if I am correct (descending):

[1,2,3] and [2,1,3]

So now I want to consider ties, so would I now use for the first vector:

[1,2,2] or [1,2.5,2.5]

Basically, is this whole concept correct and how to handle ties for such dictionary-based data.

As suggested by @Jaime the spearmanr function works with values, but why is this behavior possible:

In [5]: spearmanr([0,1,2,3],[1,3,2,0])
Out[5]: (-0.39999999999999997, 0.59999999999999998)

In [6]: spearmanr([10,7,6,5],[0.9,0.5,0.6,1.0])
Out[6]: (-0.39999999999999997, 0.59999999999999998)

Thanks!

like image 371
fsociety Avatar asked Feb 11 '13 15:02

fsociety


1 Answers

scipy.stats.spearmanr will take care of computing the ranks for you, you simply have to give it the data in the correct order:

>>> scipy.stats.spearmanr([0.3, 0.2, 0.2], [0.5, 0.6, 0.4])
(0.0, 1.0)

If you have the ranked data, you can call scipy.stats.pearsonr on it to get the same result. And as the examples below show, either of the ways you have tried will work, although I think [1, 2.5, 2.5] is more common. Also, scipy uses zero-based indexing, so the ranks internally used will be more like [0, 1.5, 1.5]:

>>> scipy.stats.pearsonr([1, 2, 2], [2, 1, 3])
(0.0, 1.0)
>>> scipy.stats.pearsonr([1, 2.5, 2.5], [2, 1, 3])
(0.0, 1.0)
like image 133
Jaime Avatar answered Oct 12 '22 01:10

Jaime