Find all indices of maximum in Pandas DataFrame

Tags:

python

pandas

I need to find all indices where the maximum value (per row) is obtained in a Pandas DataFrame. For instance, if I have a dataFrame like this:

   cat1  cat2  cat3
0     0     2     2
1     3     0     1
2     1     1     0

then the method I am looking for would yield a result like:

[['cat2', 'cat3'],
 ['cat1'],
 ['cat1', 'cat2']]

This is a list of lists, but some other data structure is also okay.

I cannot use df.idxmax(axis=1), because it only yields the first maximum.

341

asked Feb 07 '14 12:02

RafG

1 Answers

Here is the information, in a different data structure:

In [8]: df = pd.DataFrame({'cat1':[0,3,1], 'cat2':[2,0,1], 'cat3':[2,1,0]})

In [9]: df
Out[9]: 
   cat1  cat2  cat3
0     0     2     2
1     3     0     1
2     1     1     0

[3 rows x 3 columns]

In [10]: rowmax = df.max(axis=1)

The max values are indicated by True values:

In [82]: df.values == rowmax[:,None]
Out[82]: 
array([[False,  True,  True],
       [ True, False, False],
       [ True,  True, False]], dtype=bool)

np.where returns the indices where the DataFrame above is True.

In [84]: np.where(df.values == rowmax[:,None])
Out[84]: (array([0, 0, 1, 2, 2]), array([1, 2, 0, 0, 1]))

The first array indicates index values for axis=0, the second array for axis=1. There are 5 values in each array since there are five locations that are True.

You could use itertools.groupby to build the list of lists you posted, though perhaps you don't need this given the data structures above:

In [46]: import itertools as IT

In [47]: import operator

In [48]: idx = np.where(df.values == rowmax[:,None])

In [49]: groups = IT.groupby(zip(*idx), key=operator.itemgetter(0))

In [50]: [[df.columns[j] for i, j in grp] for k, grp in groups]
Out[50]: [['cat1', 'cat1'], ['cat2'], ['cat3', 'cat3']]

answered Oct 15 '22 01:10

unutbu

Related questions
                            
                                Selenium random timeout exceptions without any message
                            
                                Group vertices in clusters using NetworkX
                            
                                Django - Disqus not recognizing unique identifier
                            
                                Using describe() with weighted data -- mean, standard deviation, median, quantiles
                            
                                Timer shows negative time elapsed
                            
                                Subprocess pipes stdin without using files
                            
                                Readability of Scientific Python Code (Line Continuations, Variable Names, Imports)
                            
                                Difference between C++ random number generation and Python
                            
                                CMake: conditionally generate protobuf `*pb.{h|cpp}` files when *.proto files change
                            
                                Good way to collect programmatically generated test suites in nose or pytest
                            
                                Understanding performance limitations of the Tkinter Canvas
                            
                                How to unset csrf in modelviewset of django-rest-framework?
                            
                                is it possible to render a webpage directly to an image in python?
                            
                                SciPy optimize.fmin ValueError: zero-size array to reduction operation maximum which has no identity
                            
                                How can I use the edit_post function in PyTumblr?
                            
                                Sharing a yaxis label with two of three subplots in pyplot
                            
                                sklearn - cross validation with precision scoring for a subset of classes
                            
                                What is the history behind capitalization of None, True and False in Python? [duplicate]
                            
                                AttributeError: 'Sheet' object has no attribute 'write'
                            
                                Django ORM, Insert None datetime as 0 into MySQL

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With