Python/Pandas group close events together based on a window

Question

I would like to group events that happened close together into a parent event.

An example is this:

import pandas as pd

df = pd.DataFrame(
    [
        ['2020-01-01 10:00', '1'],
        ['2020-01-01 10:01', '2'],
        ['2020-01-01 10:02', '3a'],
        ['2020-01-01 10:02', '3b'],
        ['2020-01-01 10:30', '4'],
        ['2020-01-01 10:50', '5'],
        ['2020-01-01 10:54', '6'],
        ['2020-01-01 10:55', '7'],
    ], columns=['event_time', 'event_id']
)
df['event_time'] = pd.to_datetime(df['event_time'])

In the above example with a window size of 1 minute what I want is for each event a list of all other events that occurred within +- 1 minute of it.

So something like:

df = pd.DataFrame(
    [
        ['2020-01-01 10:00', '1', ['2']],
        ['2020-01-01 10:01', '2', ['1','3a','3b']],
        ['2020-01-01 10:02', '3a', ['2','3b']],
        ['2020-01-01 10:02', '3b', ['3a', '2'],
        ['2020-01-01 10:30', '4', None],
        ['2020-01-01 10:50', '5', None],
        ['2020-01-01 10:54', '6', ['7']],
        ['2020-01-01 10:55', '7', ['6']],
    ], columns=['event_time', 'event_id', 'related_event_id_list']
)

I was almost able to get close with pandas merge_asof:

pd.merge_asof(df,df, on="event_time", tolerance=pd.Timedelta("1m"), allow_exact_matches=False, direction='nearest')

enter image description here

but it seems to just want to merge to one nearest event as opposed to some sort of option or way to merge all events within the tolerance for each row (and obviously just get more rows in my results so its more like an outer join on a range).

Was thinking grouping time-series events like this must be fairly common but could not really find any obvious functions in Pandas to do this or maybe I'm missing something.

As always trying to avoid writing the code myself via loops or anything like that if I can avoid it :)

jezrael · Accepted Answer

One idea is filter DataFrame per rows with remove original row by DataFrame.drop:

td = pd.Timedelta("1m")
f = lambda x, y: df.loc[df['event_time'].between(y - td, y + td),
                        'event_id'].drop(x).tolist()
df['related_event_id_list'] = [f(k, v) for k, v in df['event_time'].items()]
print (df)
           event_time event_id related_event_id_list
0 2020-01-01 10:00:00        1                   [2]
1 2020-01-01 10:01:00        2           [1, 3a, 3b]
2 2020-01-01 10:02:00       3a               [2, 3b]
3 2020-01-01 10:02:00       3b               [2, 3a]
4 2020-01-01 10:30:00        4                    []
5 2020-01-01 10:50:00        5                    []
6 2020-01-01 10:54:00        6                   [7]
7 2020-01-01 10:55:00        7                   [6]

Python/Pandas group close events together based on a window

Tags:

python

pandas

time-series

andrewm4894

1 Answers

jezrael

Recent Activity

Donate For Us

Python/Pandas group close events together based on a window

Tags:

python

pandas

time-series

andrewm4894

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us