Lets say I have just calculated a correlation matrix. Using a pandas dataframe, I would now like to obtain the highest correlations with their axes names in place.
E.g. from:
   a, b, c, d, e, f 
a, 0, 1, 2, 3, 4, 5,
b, 1, 0, 3, 4, 5, 6,
c, 2, 3, 0, 5, 6, 7,
d, 3, 4, 5, 0, 7, 8,
e, 4, 5, 6, 7, 0, 9,
f, 5, 6, 7, 8, 9, 0
get:
e f 9
f d 8
f c 7
e d 7
etc...
I have read through the pandas docs and see the groupby methods as well as functions like head, but I'm a bit lost on how one would be expected to perform this operation.
You can use stack here, which will produce a Series with the row and column information in the index, and then call nlargest on that:
>>> df.stack()
a  a    0
   b    1
   c    2
   d    3
   e    4
   f    5
b  a    1
   b    0
   c    3
[etc.]
>>> df.stack().nlargest(6)
e  f    9
f  e    9
d  f    8
f  d    8
c  f    7
d  e    7
dtype: int64
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With