Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get N largest values from pandas array, with index and column headings intact

Tags:

python

pandas

Lets say I have just calculated a correlation matrix. Using a pandas dataframe, I would now like to obtain the highest correlations with their axes names in place.

E.g. from:

   a, b, c, d, e, f 
a, 0, 1, 2, 3, 4, 5,
b, 1, 0, 3, 4, 5, 6,
c, 2, 3, 0, 5, 6, 7,
d, 3, 4, 5, 0, 7, 8,
e, 4, 5, 6, 7, 0, 9,
f, 5, 6, 7, 8, 9, 0

get:

e f 9
f d 8
f c 7
e d 7

etc...

I have read through the pandas docs and see the groupby methods as well as functions like head, but I'm a bit lost on how one would be expected to perform this operation.

like image 231
Sirrah Avatar asked Jan 10 '23 23:01

Sirrah


1 Answers

You can use stack here, which will produce a Series with the row and column information in the index, and then call nlargest on that:

>>> df.stack()
a  a    0
   b    1
   c    2
   d    3
   e    4
   f    5
b  a    1
   b    0
   c    3
[etc.]
>>> df.stack().nlargest(6)
e  f    9
f  e    9
d  f    8
f  d    8
c  f    7
d  e    7
dtype: int64
like image 196
DSM Avatar answered Jan 30 '23 11:01

DSM