Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas, how to find the row/index where the cumulative sum is greater than a threshold?

Tags:

python

pandas

I would like find the row (index) where the cumulative sum of the values in some column exceeds a threshold.

I can, and do, find this location using a simple loop, like below:

def sum_to(df, col, threshold):
    s = 0
    for r in df.iterrows():
        if s + r[1][col] > threshold:
            return r[0]
        else:
            s += r[1][col]

    return len(df)

However, I would like to know if there is a better/nicer way to achieve this in Pandas.

like image 607
mibm Avatar asked Dec 18 '22 02:12

mibm


1 Answers

The simplest way is probably

df[col].cumsum().searchsorted(threshold)

but this assumes that you have no negative numbers in your column.

like image 77
Isaac Avatar answered Dec 27 '22 02:12

Isaac