I want to compute the spearman rank correlation using Python and most likely scipy implementation (scipy.stats.spearmanr).
The data at hand looks e.g., the following way (dictionaries):
{a:0.3, b:0.2, c:0.2} and {a:0.5, b:0.6, c:0.4}
To now pass it over to the spearman module, I would assign them ranks, if I am correct (descending):
[1,2,3] and [2,1,3]
So now I want to consider ties, so would I now use for the first vector:
[1,2,2] or [1,2.5,2.5]
Basically, is this whole concept correct and how to handle ties for such dictionary-based data.
As suggested by @Jaime the spearmanr function works with values, but why is this behavior possible:
In [5]: spearmanr([0,1,2,3],[1,3,2,0])
Out[5]: (-0.39999999999999997, 0.59999999999999998)
In [6]: spearmanr([10,7,6,5],[0.9,0.5,0.6,1.0])
Out[6]: (-0.39999999999999997, 0.59999999999999998)
Thanks!
scipy.stats.spearmanr
will take care of computing the ranks for you, you simply have to give it the data in the correct order:
>>> scipy.stats.spearmanr([0.3, 0.2, 0.2], [0.5, 0.6, 0.4])
(0.0, 1.0)
If you have the ranked data, you can call scipy.stats.pearsonr
on it to get the same result. And as the examples below show, either of the ways you have tried will work, although I think [1, 2.5, 2.5]
is more common. Also, scipy uses zero-based indexing, so the ranks internally used will be more like [0, 1.5, 1.5]
:
>>> scipy.stats.pearsonr([1, 2, 2], [2, 1, 3])
(0.0, 1.0)
>>> scipy.stats.pearsonr([1, 2.5, 2.5], [2, 1, 3])
(0.0, 1.0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With