I'm trying to find the max since condition was true in a pandas dataframe. I've searched for similar questions and read the documentation but haven't been able to find this problem discussed. To illustrate, I want a function that will return the maxsince column below.
In [84]: df
Out[84]:
a b maxsince
2007-04-27 11:00:00 1 True 1
2007-04-27 11:30:00 5 False 5
2007-04-27 12:00:00 3 False 5
2007-04-27 12:30:00 2 True 2
2007-04-27 13:00:00 2 False 2
2007-04-27 13:30:00 7 True 7
2007-04-27 14:00:00 3 False 7
2007-04-27 14:30:00 4 False 7
I'm having trouble calculating this without resorting to looping. What would be the most efficient way? Thanks.
How about:
>>> df.groupby(df["b"].cumsum())["a"].cummax()
2007-04-27 11:00:00 1
11:30:00 5
12:00:00 5
12:30:00 2
13:00:00 2
13:30:00 7
14:00:00 7
14:30:00 7
dtype: int64
This works because
>>> df["b"].cumsum()
2007-04-27 11:00:00 1
11:30:00 1
12:00:00 1
12:30:00 2
13:00:00 2
13:30:00 3
14:00:00 3
14:30:00 3
Name: b, dtype: int32
gives us a new value whenever we see a True
. You might have to patch it a bit depending on what you want to happen when the first value is False
, but I'll leave that as an exercise for the reader. ;^)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With