How can I calculate the time (number of days) between "events" in a Pandas time series?  For example, if I have the below time series I'd like to know on each day in the series how many days have passed since the last TRUE
            event
2010-01-01  False
2010-01-02   True
2010-01-03  False
2010-01-04  False
2010-01-05   True
2010-01-06  False
The way I've done it seems overcomplicated, so I'm hoping for something more elegant. Obviously a for loop iterating over the rows would work, but I'm looking for a vectorized (scalable) solution ideally. My current attempt below:
date_range = pd.date_range('2010-01-01', '2010-01-06')
df = pd.DataFrame([False, True, False, False, True, False], index=date_range, columns=['event'])
event_dates = df.index[df['event']]
df2 = pd.DataFrame(event_dates, index=event_dates, columns=['max_event_date'])
df = df.join(df2)
df['max_event_date'] = df['max_event_date'].cummax(axis=0, skipna=False)
df['days_since_event'] = df.index - df['max_event_date']
            event max_event_date  days_since_event
2010-01-01  False            NaT               NaT
2010-01-02   True     2010-01-02            0 days
2010-01-03  False     2010-01-02            1 days
2010-01-04  False     2010-01-02            2 days
2010-01-05   True     2010-01-05            0 days
2010-01-06  False     2010-01-05            1 days
                Continuing to improve on this answer, and hoping that someone comes in with 'the' pythonic way. Until then, I think this final update works best.
last = pd.to_datetime(np.nan)
def elapsed(row):
    if not row.event:
        return row.name - last
    else:
        global last
        last = row.name
        return row.name-last
df['elapsed'] = df.apply(elapsed,axis=1)
df
            event  elapsed
2010-01-01  False      NaT
2010-01-02   True   0 days
2010-01-03  False   1 days
2010-01-04  False   2 days
2010-01-05   True   0 days
2010-01-06  False   1 days
:::::::::::::
Leaving previous answers below although they are sub-optimal
:::::::::
Instead of making multiple passes through, seems easier to to just loop through the indexes
df['elapsed'] = 0
for i in df.index[1:]:
    if not df['event'][i]:
        df['elapsed'][i] = df['elapsed'][i-1] + 1
::::::::::::
Let's say 'Trues' are your event of interest.
trues = df[df.event==True]
trues.Dates = trues.index #need this because .diff() doesn't work on the index
trues.Elapsed = trues.Dates.diff()
                        A one-pass solution would certainly be ideal, but here's a multi-pass solution using only (presumably) cythonized pandas functions:
def get_delay(ds):
    x1 = (~ds).cumsum()
    x2 = x1.where(ds, np.nan).ffill()
    return x1 - x2
date_range = pd.date_range('2010-01-01', '2010-01-06')
ds = pd.Series([False, True, False, False, True, False], index=date_range)
pd.concat([ds, get_delay(ds)], axis=1)
            Event   Last
2010-01-01  False   NaN
2010-01-02  True    0
2010-01-03  False   1
2010-01-04  False   2
2010-01-05  True    0
2010-01-06  False   1
And interestingly it seems to perform a little better in some quick benchmarks, possibly due to avoiding row-wise operations:
%%timeit -n 1000
    def get_delay(ds):
        x1 = (~ds).cumsum()
        x2 = x1.where(ds, np.nan).ffill()
        return x1 - x2
    n = 100
    events = np.random.choice([True, False], size=n)
    date_range = pd.date_range('2010-01-01', periods=n)
    df = pd.DataFrame(events, index=date_range, columns=['event'])
    get_delay(df['event'])
1000 loops, best of 3: 1.09 ms per loop
Versus the single loop approach with a global:
%%timeit -n 1000
last = pd.to_datetime(np.nan)
def elapsed(row):
    if not row.event:
        return row.name - last
    else:
        global last
        last = row.name
        return row.name-last
n = 100
events = np.random.choice([True, False], size=n)
date_range = pd.date_range('2010-01-01', periods=n)
df = pd.DataFrame(events, index=date_range, columns=['event'])
df.apply(elapsed, axis=1)
1000 loops, best of 3: 2.4 ms per loop
Perhaps there's some nuance in that comparison that doesn't make it fair but either way, the no-custom-functions version certainly doesn't seem to be a whole lot slower, if at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With