Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get Column and Row Index for Highest Value in Dataframe Pandas

I'd like to know if there's a way to find the location (column and row index) of the highest value in a dataframe. So if for example my dataframe looks like this:

   A         B         C         D         E
0  100       9         1         12        6
1  80        10        67        15        91
2  20        67        1         56        23
3  12        51        5         10        58
4  73        28        72        25        1

How do I get a result that looks like this: [0, 'A'] using Pandas?

like image 613
christfan868 Avatar asked Dec 29 '17 02:12

christfan868


People also ask

How do you find the index of a max value in a DataFrame?

Pandas DataFrame idxmax() Method The idxmax() method returns a Series with the index of the maximum value for each column. By specifying the column axis ( axis='columns' ), the idxmax() method returns a Series with the index of the maximum value for each row.

How do you find the index of a column with maximum value?

To get the position of the maximum value in a range (i.e. a list, table, or row), you can use the MAX function together with the MATCH function. Which returns the number 4, representing the position in this list of the the most expensive property. The MAX function first extracts the maximum value from the range C3:C11.

How do I find the maximum row in a DataFrame?

To find maximum value of every row in DataFrame just call the max() member function with DataFrame object with argument axis=1 i.e. It returned a series with row index label and maximum value of each row.


1 Answers

Use np.argmax

NumPy's argmaxcan be helpful:

>>> df.stack().index[np.argmax(df.values)]
(0, 'A')

In steps

df.values is a two-dimensional NumPy array:

>>> df.values
array([[100,   9,   1,  12,   6],
       [ 80,  10,  67,  15,  91],
       [ 20,  67,   1,  56,  23],
       [ 12,  51,   5,  10,  58],
       [ 73,  28,  72,  25,   1]])

argmax gives you the index for the maximum value for the "flattened" array:

>>> np.argmax(df.values)
0

Now, you can use this index to find the row-column location on the stacked dataframe:

>>> df.stack().index[0]
(0, 'A')

Fast Alternative

If you need it fast, do as few steps as possible. Working only on the NumPy array to find the indices np.argmax seems best:

v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]

Result:

[0, 'A']

Timings

Timing works best for lareg data frames:

df = pd.DataFrame(data=np.arange(int(1e6)).reshape(-1,5), columns=list('ABCDE'))

Sorted slowest to fastest:

Mask:

%timeit df.mask(~(df==df.max().max())).stack().index.tolist()
33.4 ms ± 982 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Stack-idmax

%timeit list(df.stack().idxmax())
17.1 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Stack-argmax

%timeit df.stack().index[np.argmax(df.values)]
14.8 ms ± 392 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Where

%%timeit
i,j = np.where(df.values == df.values.max())
list((df.index[i].values.tolist()[0],df.columns[j].values.tolist()[0]))

4.45 ms ± 84.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Argmax-unravel_index

%%timeit

v = df.values
i, j = [x[0] for x in np.unravel_index([np.argmax(v)], v.shape)]
[df.index[i], df.columns[j]]

499 µs ± 12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Compare

d = {'name': ['Mask', 'Stack-idmax', 'Stack-argmax', 'Where', 'Argmax-unravel_index'],
     'time': [33.4, 17.1, 14.8, 4.45, 499],
     'unit': ['ms', 'ms', 'ms', 'ms', 'µs']}


timings = pd.DataFrame(d)
timings['seconds'] = timings.time * timings.unit.map({'ms': 1e-3, 'µs': 1e-6})
timings['factor slower'] = timings.seconds / timings.seconds.min()
timings.sort_values('factor slower')

Output:

                   name    time unit   seconds  factor slower
4  Argmax-unravel_index  499.00   µs  0.000499       1.000000
3                 Where    4.45   ms  0.004450       8.917836
2          Stack-argmax   14.80   ms  0.014800      29.659319
1           Stack-idmax   17.10   ms  0.017100      34.268537
0                  Mask   33.40   ms  0.033400      66.933868

So the "Argmax-unravel_index" version seems to be one to nearly two orders of magnitude faster for large data frames, i.e. where often speeds matters most.

like image 61
Mike Müller Avatar answered Oct 05 '22 10:10

Mike Müller