Lets say I have just calculated a correlation matrix. Using a pandas dataframe, I would now like to obtain the highest correlations with their axes names in place.
E.g. from:
a, b, c, d, e, f
a, 0, 1, 2, 3, 4, 5,
b, 1, 0, 3, 4, 5, 6,
c, 2, 3, 0, 5, 6, 7,
d, 3, 4, 5, 0, 7, 8,
e, 4, 5, 6, 7, 0, 9,
f, 5, 6, 7, 8, 9, 0
get:
e f 9
f d 8
f c 7
e d 7
etc...
I have read through the pandas docs and see the groupby methods as well as functions like head, but I'm a bit lost on how one would be expected to perform this operation.
You can use stack
here, which will produce a Series with the row and column information in the index, and then call nlargest
on that:
>>> df.stack()
a a 0
b 1
c 2
d 3
e 4
f 5
b a 1
b 0
c 3
[etc.]
>>> df.stack().nlargest(6)
e f 9
f e 9
d f 8
f d 8
c f 7
d e 7
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With