Get Column and Row Index for Highest Value in Dataframe Pandas

Tags:

I'd like to know if there's a way to find the location (column and row index) of the highest value in a dataframe. So if for example my dataframe looks like this:

   A         B         C         D         E
0  100       9         1         12        6
1  80        10        67        15        91
2  20        67        1         56        23
3  12        51        5         10        58
4  73        28        72        25        1

How do I get a result that looks like this: [0, 'A'] using Pandas?

613

asked Dec 29 '17 02:12

christfan868

1 Answers

Use `np.argmax`

NumPy's argmaxcan be helpful:

>>> df.stack().index[np.argmax(df.values)]
(0, 'A')

In steps

df.values is a two-dimensional NumPy array:

>>> df.values
array([[100,   9,   1,  12,   6],
       [ 80,  10,  67,  15,  91],
       [ 20,  67,   1,  56,  23],
       [ 12,  51,   5,  10,  58],
       [ 73,  28,  72,  25,   1]])

argmax gives you the index for the maximum value for the "flattened" array:

>>> np.argmax(df.values)
0

Now, you can use this index to find the row-column location on the stacked dataframe:

>>> df.stack().index[0]
(0, 'A')

Fast Alternative

If you need it fast, do as few steps as possible. Working only on the NumPy array to find the indices np.argmax seems best:

v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]

Result:

[0, 'A']

Timings

Timing works best for lareg data frames:

df = pd.DataFrame(data=np.arange(int(1e6)).reshape(-1,5), columns=list('ABCDE'))

Sorted slowest to fastest:

Mask:

%timeit df.mask(~(df==df.max().max())).stack().index.tolist()
33.4 ms ± 982 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Stack-idmax

%timeit list(df.stack().idxmax())
17.1 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Stack-argmax

%timeit df.stack().index[np.argmax(df.values)]
14.8 ms ± 392 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Where

%%timeit
i,j = np.where(df.values == df.values.max())
list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))

4.45 ms ± 84.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Argmax-unravel_index

%%timeit

v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]

499 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Compare

d = {'name': ['Mask', 'Stack-idmax', 'Stack-argmax', 'Where', 'Argmax-unravel_index'],
     'time': [33.4, 17.1, 14.8, 4.45, 499],
     'unit': ['ms', 'ms', 'ms', 'ms', 'µs']}


timings = pd.DataFrame(d)
timings['seconds'] = timings.time * timings.unit.map({'ms': 1e-3, 'µs': 1e-6})
timings['factor slower'] = timings.seconds / timings.seconds.min()
timings.sort_values('factor slower')

Output:

                   name    time unit   seconds  factor slower
4  Argmax-unravel_index  499.00   µs  0.000499       1.000000
3                 Where    4.45   ms  0.004450       8.917836
2          Stack-argmax   14.80   ms  0.014800      29.659319
1           Stack-idmax   17.10   ms  0.017100      34.268537
0                  Mask   33.40   ms  0.033400      66.933868

So the "Argmax-unravel_index" version seems to be one to nearly two orders of magnitude faster for large data frames, i.e. where often speeds matters most.

answered Oct 05 '22 10:10

Mike Müller

Related questions
                            
                                How can I get the total number of elements in my arbitrarily nested list of lists?
                            
                                Convert html to pdf using Python/Flask
                            
                                Celery worker hangs without any error
                            
                                Error when installing using pip
                            
                                Custom Colormap in Python
                            
                                How to setup PyCharm for multiple projects
                            
                                Find index of last true value in pandas Series or DataFrame
                            
                                Read a list of hostnames and resolve to IP addresses
                            
                                Accessing Request Object in Viewset and Serializers in Django Rest Framework?
                            
                                Understanding Stacks and Queues in python
                            
                                Pandas, Get count of a single value in a Column of a Dataframe
                            
                                PyQt QTableView Set Horizontal & Vertical Header Labels
                            
                                TensorFlow: Saver has 5 models limit
                            
                                Does python have an equivalent to Javascript's 'btoa'
                            
                                Setting up a LearningRateScheduler in Keras
                            
                                How to read a compressed (gz) CSV file into a dask Dataframe?
                            
                                Change contrast of image in PIL
                            
                                Make Dictionary with only keys?
                            
                                Body of abstract method in Python 3.5 [duplicate]
                            
                                Python asyncio: event loop does not seem to stop when stop method is called

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get Column and Row Index for Highest Value in Dataframe Pandas

Tags:

python

pandas

dataframe

christfan868

People also ask

1 Answers

Use `np.argmax`

In steps

Fast Alternative

Timings

Mask:

Stack-idmax

Stack-argmax

Where

Argmax-unravel_index

Compare

Mike Müller

Recent Activity

Donate For Us

Get Column and Row Index for Highest Value in Dataframe Pandas

Tags:

python

pandas

dataframe

christfan868

People also ask

1 Answers

Use np.argmax

In steps

Fast Alternative

Timings

Mask:

Stack-idmax

Stack-argmax

Where

Argmax-unravel_index

Compare

Mike Müller

Related questions

Recent Activity

Donate For Us

Use `np.argmax`