Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate a spearman rank correlation in pandas?

I have a dataframe that looks like this: Each value represents a value one of 5 distances (1000m, 800m, 600m, 400m, 200m, 0).

'key1': array([  1.21,   0.99,   6.66,
          5.22,   3.33]), 'key2': array([  2.21,   2.99,   5.66,
          6.22,   2.33]), 'key3': array([  4.21,   1.59,   6.66,
          9.12,   0.23])......

I want to calculate a Spearman rank correlation between the values and the distances for each of the keys.

I have a lot of 'keys' I would like to do this somehow in pandas. And then plot a graph of spearman rank and distance averaging across all keys.

like image 715
Alex Trevylan Avatar asked Jan 28 '23 10:01

Alex Trevylan


2 Answers

Since you mention pandas , and there is corr function in pandas with method spearman

pd.concat([pd.DataFrame(v),pd.DataFrame(d)],axis=1).corr(method="spearman").iloc[-1]
Out[1302]: 
key1   -0.5
key2   -0.4
key3    0.1
0       1.0
Name: 0, dtype: float64
like image 55
BENY Avatar answered Jan 30 '23 23:01

BENY


This is one way via a dictionary comprehension and scipy.stats.spearmanr.

import numpy as np
from scipy.stats import spearmanr

d = np.array([1000, 800, 600, 400, 200])

v = {'key1': np.array([  1.21,   0.99,   6.66,   5.22,   3.33]),
     'key2': np.array([  2.21,   2.99,   5.66,   6.22,   2.33]),
     'key3': np.array([  4.21,   1.59,   6.66,   9.12,   0.23])}

res = {k: spearmanr(v[k], d)[0] for k in sorted(v)}

If you want to use pandas, my advice is perform your calculations as above and create a dataframe from your results.

This will almost certainly be more efficient than performing your calculations after putting data in pandas.

df = pd.DataFrame.from_dict(res, orient='index')

Result:

        0
key1 -0.5
key2 -0.4
key3  0.1
like image 29
jpp Avatar answered Jan 31 '23 00:01

jpp