I would like find the row (index) where the cumulative sum of the values in some column exceeds a threshold.
I can, and do, find this location using a simple loop, like below:
def sum_to(df, col, threshold):
s = 0
for r in df.iterrows():
if s + r[1][col] > threshold:
return r[0]
else:
s += r[1][col]
return len(df)
However, I would like to know if there is a better/nicer way to achieve this in Pandas.
The simplest way is probably
df[col].cumsum().searchsorted(threshold)
but this assumes that you have no negative numbers in your column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With