find index of a value before the maximum for each column in python dataframe

Tags:

i have a dataframe as below.

test = pd.DataFrame({'col1':[0,0,1,0,0,0,1,2,0], 'col2': [0,0,1,2,3,0,0,0,0]})
   col1  col2
0     0     0
1     0     0
2     1     1
3     0     2
4     0     3
5     0     0
6     1     0
7     2     0
8     0     0

For each column, i want to find the index of value 1 before the maximum of each column. For example, for the first column, the max is 2, the index of value 1 before 2 is 6. for the second column, the max is 3, the index of value 1 before the value 3 is 2.

In summary, I am looking to get [6, 2] as the output for this test DataFrame. Is there a quick way to achieve this?

245

asked Jun 13 '19 16:06

AAA

4 Answers

Use Series.mask to hide elements that aren't 1, then apply Series.last_valid_index to each column.

m = test.eq(test.max()).cumsum().gt(0) | test.ne(1) 
test.mask(m).apply(pd.Series.last_valid_index)

col1    6
col2    2
dtype: int64

Using numpy to vectorize, you can use numpy.cumsum and argmax:

idx = ((test.eq(1) & test.eq(test.max()).cumsum().eq(0))
            .values
            .cumsum(axis=0)
            .argmax(axis=0))
idx
# array([6, 2])

pd.Series(idx, index=[*test])

col1    6
col2    2
dtype: int64

174

answered Sep 29 '22 01:09

cs95

Using @cs95 idea of last_valid_index:

test.apply(lambda x: x[:x.idxmax()].eq(1)[lambda i:i].last_valid_index())

Output:

col1    6
col2    2
dtype: int64

Expained:

Using index slicing to cut each column to start to max value, then look for the values that are equal to one and find the index of the last true value.

Or as @QuangHoang suggests:

test.apply(lambda x: x[:x.idxmax()].eq(1).cumsum().idxmax())

answered Sep 29 '22 01:09

Scott Boston

Overkill with Numpy

t = test.to_numpy()
a = t.argmax(0)

i, j = np.where(t == 1)
mask = i <= a[j]
i = i[mask]
j = j[mask]

b = np.empty_like(a)
b.fill(-1)

np.maximum.at(b, j, i)

pd.Series(b, test.columns)

col1    6
col2    2
dtype: int64

`apply`

test.apply(lambda s: max(s.index, key=lambda x: (s[x] == 1, s[x] <= s.max(), x)))

col1    6
col2    2
dtype: int64

`cummax`

test.eq(1).where(test.cummax().lt(test.max())).iloc[::-1].idxmax()

col1    6
col2    2
dtype: int64

Timing

I just wanted to use a new tool and do some bechmarking see this post

Results

r.to_pandas_dataframe().T

         10        31        100       316       1000      3162      10000
al_0  0.003696  0.003718  0.005512  0.006210  0.010973  0.007764  0.012008
wb_0  0.003348  0.003334  0.003913  0.003935  0.004583  0.004757  0.006096
qh_0  0.002279  0.002265  0.002571  0.002643  0.002927  0.003070  0.003987
sb_0  0.002235  0.002246  0.003072  0.003357  0.004136  0.004083  0.005286
sb_1  0.001771  0.001779  0.002331  0.002353  0.002914  0.002936  0.003619
cs_0  0.005742  0.005751  0.006748  0.006808  0.007845  0.008088  0.009898
cs_1  0.004034  0.004045  0.004871  0.004898  0.005769  0.005997  0.007338
pr_0  0.002484  0.006142  0.027101  0.085944  0.374629  1.292556  6.220875
pr_1  0.003388  0.003414  0.003981  0.004027  0.004658  0.004929  0.006390
pr_2  0.000087  0.000088  0.000089  0.000093  0.000107  0.000145  0.000300

fig = plt.figure(figsize=(10, 10))
ax = plt.subplot()
r.plot(ax=ax)

enter image description here

Setup

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

def al_0(test): return test.apply(lambda x: x.where(x[:x.idxmax()].eq(1)).drop_duplicates(keep='last').idxmin())
def wb_0(df): return (df.iloc[::-1].cummax().eq(df.max())&df.eq(1).iloc[::-1]).idxmax()
def qh_0(test): return (test.eq(1) & (test.index.values[:,None] < test.idxmax().values)).cumsum().idxmax()
def sb_0(test): return test.apply(lambda x: x[:x.idxmax()].eq(1)[lambda i:i].last_valid_index())
def sb_1(test): return test.apply(lambda x: x[:x.idxmax()].eq(1).cumsum().idxmax())
def cs_0(test): return (lambda m: test.mask(m).apply(pd.Series.last_valid_index))(test.eq(test.max()).cumsum().gt(0) | test.ne(1))
def cs_1(test): return pd.Series((test.eq(1) & test.eq(test.max()).cumsum().eq(0)).values.cumsum(axis=0).argmax(axis=0), test.columns)
def pr_0(test): return test.apply(lambda s: max(s.index, key=lambda x: (s[x] == 1, s[x] <= s.max(), x)))
def pr_1(test): return test.eq(1).where(test.cummax().lt(test.max())).iloc[::-1].idxmax()
def pr_2(test):
    t = test.to_numpy()
    a = t.argmax(0)

    i, j = np.where(t == 1)
    mask = i <= a[j]
    i = i[mask]
    j = j[mask]

    b = np.empty_like(a)
    b.fill(-1)

    np.maximum.at(b, j, i)

    return pd.Series(b, test.columns)

import math

def gen_test(n):
    a = np.random.randint(100, size=(n, int(math.log10(n)) + 1))
    idx = a.argmax(0)
    while (idx == 0).any():
        a = np.random.randint(100, size=(n, int(math.log10(n)) + 1))
        idx = a.argmax(0)        

    for j, i in enumerate(idx):
        a[np.random.randint(i), j] = 1

    return pd.DataFrame(a).add_prefix('col')

@b.add_arguments('DataFrame Size')
def argument_provider():
    for exponent in np.linspace(1, 3, 5):
        size = int(10 ** exponent)
        yield size, gen_test(size)

b.add_functions([al_0, wb_0, qh_0, sb_0, sb_1, cs_0, cs_1, pr_0, pr_1, pr_2])

r = b.run()

answered Sep 29 '22 01:09

piRSquared

A little bit logic here

(df.iloc[::-1].cummax().eq(df.max())&df.eq(1).iloc[::-1]).idxmax()
Out[187]: 
col1    6
col2    2
dtype: int64

answered Sep 29 '22 00:09

BENY

Related questions
                            
                                How to help tqdm figure out the total in a custom iterator
                            
                                How do you capitalize the first two letters
                            
                                How can I interpolate variables in Python heredocs?
                            
                                ImportError: cannot import name 'db'
                            
                                'End of statement expected' in pycharm [duplicate]
                            
                                Django queryset order_by absolute value
                            
                                Django Rest Framework to return dicts instead of OrderedDicts
                            
                                Finding count of tuples with same first and third item in list of tuples
                            
                                loading EMNIST-letters dataset
                            
                                How to install pyzmq on an Alpine Linux container?
                            
                                Python request gives 415 error while post data
                            
                                Frame from video is upside down after extracting
                            
                                change column name pandas
                            
                                A pythonic and uFunc-y way to turn pandas column into "increasing" index? [duplicate]
                            
                                Double header in Matplotlib Table
                            
                                Can't start spyder because of PyQt5.QtWebKitWidgets
                            
                                How to rewrite this simple loop using assignment expressions introduced in Python 3.8 alpha?
                            
                                Django messages middleware issue while testing post request
                            
                                How to convert single list's elements in form of dictionary
                            
                                PyInstaller exe returning error on a Tkinter script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

find index of a value before the maximum for each column in python dataframe

Tags:

python

pandas

dataframe

AAA

People also ask

4 Answers

cs95

Scott Boston

Overkill with Numpy

`apply`

`cummax`

Timing

Results

Setup

piRSquared

BENY

Recent Activity

Donate For Us