I have a dataframe in which all values are of the same variety (e.g. a correlation matrix -- but where we expect a unique maximum). I'd like to return the row and the column of the maximum of this matrix.
I can get the max across rows or columns by changing the first argument of
df.idxmax()
however I haven't found a suitable way to return the row/column index of the max of the whole dataframe.
For example, I can do this in numpy:
>>>npa = np.array([[1,2,3],[4,9,5],[6,7,8]])
>>>np.where(npa == np.amax(npa))
(array([1]), array([1]))
But when I try something similar in pandas:
>>>df = pd.DataFrame([[1,2,3],[4,9,5],[6,7,8]],columns=list('abc'),index=list('def'))
>>>df.where(df == df.max().max())
a b c
d NaN NaN NaN
e NaN 9 NaN
f NaN NaN NaN
At a second level, what I acutally want to do is to return the rows and columns of the top n values, e.g. as a Series.
E.g. for the above I'd like a function which does:
>>>topn(df,3)
b e
c f
b f
dtype: object
>>>type(topn(df,3))
pandas.core.series.Series
or even just
>>>topn(df,3)
(['b','c','b'],['e','f','f'])
a la numpy.where()
I figured out the first part:
npa = df.as_matrix()
cols,indx = np.where(npa == np.amax(npa))
([df.columns[c] for c in cols],[df.index[c] for c in indx])
Now I need a way to get the top n. One naive idea is to copy the array, and iteratively replace the top values with NaN
grabbing index as you go. Seems inefficient. Is there a better way to get the top n values of a numpy array? Fortunately, as shown here there is, through argpartition
, but we have to use flattened indexing.
def topn(df,n):
npa = df.as_matrix()
topn_ind = np.argpartition(npa,-n,None)[-n:] #flatend ind, unsorted
topn_ind = topn_ind[np.argsort(npa.flat[topn_ind])][::-1] #arg sort in descending order
cols,indx = np.unravel_index(topn_ind,npa.shape,'F') #unflatten, using column-major ordering
return ([df.columns[c] for c in cols],[df.index[i] for i in indx])
Trying this on the example:
>>>df = pd.DataFrame([[1,2,3],[4,9,5],[6,7,8]],columns=list('abc'),index=list('def'))
>>>topn(df,3)
(['b', 'c', 'b'], ['e', 'f', 'f'])
As desired. Mind you the sorting was not originally asked for, but provides little overhead if n
is not large.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With