Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find max since condition in pandas timeseries dataframe

I'm trying to find the max since condition was true in a pandas dataframe. I've searched for similar questions and read the documentation but haven't been able to find this problem discussed. To illustrate, I want a function that will return the maxsince column below.

In [84]: df
Out[84]: 
                     a      b  maxsince
2007-04-27 11:00:00  1   True         1
2007-04-27 11:30:00  5  False         5
2007-04-27 12:00:00  3  False         5
2007-04-27 12:30:00  2   True         2
2007-04-27 13:00:00  2  False         2
2007-04-27 13:30:00  7   True         7
2007-04-27 14:00:00  3  False         7
2007-04-27 14:30:00  4  False         7

I'm having trouble calculating this without resorting to looping. What would be the most efficient way? Thanks.

like image 917
user2205 Avatar asked Oct 31 '13 19:10

user2205


1 Answers

How about:

>>> df.groupby(df["b"].cumsum())["a"].cummax()
2007-04-27  11:00:00    1
            11:30:00    5
            12:00:00    5
            12:30:00    2
            13:00:00    2
            13:30:00    7
            14:00:00    7
            14:30:00    7
dtype: int64

This works because

>>> df["b"].cumsum()
2007-04-27  11:00:00    1
            11:30:00    1
            12:00:00    1
            12:30:00    2
            13:00:00    2
            13:30:00    3
            14:00:00    3
            14:30:00    3
Name: b, dtype: int32

gives us a new value whenever we see a True. You might have to patch it a bit depending on what you want to happen when the first value is False, but I'll leave that as an exercise for the reader. ;^)

like image 94
DSM Avatar answered Oct 10 '22 15:10

DSM