Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas data frame time difference between specific alternating values

I have a dataframe of app usage in 4 columns that looks like this:

Id  Timestamp                App_Name   Event_Type
1   2018/01/16 06:01:05     Instagram   Opened
2   2018/01/16 06:01:06     Instagram   Closed
3   2018/01/16 06:01:07     Instagram   Opened
4   2018/01/16 06:01:08     Instagram   Interaction
5   2018/01/16 06:01:09     Instagram   Interaction
6   2018/01/16 06:02:08     Instagram   Closed
7   2018/01/16 06:01:08     Instagram   Opened
8   2018/01/16 06:01:08     Instagram   Opened
9   2018/01/16 06:01:09     Instagram   Opened
10  2018/01/16 06:01:09     Instagram   Closed
11  2018/01/16 06:03:44     Instagram   Opened
12  2018/01/16 06:03:44     Instagram   Closed
13  2018/01/16 06:03:45     Instagram   Closed
14  2018/01/16 06:03:45     Instagram   Closed
15  2018/01/16 06:03:47     Instagram   Opened

I want to get time difference in seconds between each pair of 'Opened' followed by 'Closed' rows regardless of whether or not there are other 'Event_Types' between them. There can be errors where there is more than one consecutive open or close. I just want difference between last open and first close. So in this case I want time differences between:

  • Rows 2 and 1
  • 6 and 3
  • 10 and 9
  • 12 and 11

How do I do that?

Thanks!

like image 237
user5977110 Avatar asked Jun 16 '26 12:06

user5977110


2 Answers

Here's another more involved way, lineup the records side-by-side and subtracting timestamp columns.

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

df = df.set_index('Id')
df['g'] = (df['Event_Type'] == 'Opened').cumsum()

df_open = df.query('Event_Type == "Opened"').groupby('g').head(1)
df_close = df.query('Event_Type == "Closed"').groupby('g').head(1)

df_result = df_open.merge(df_close, on='g', suffixes=('_Opened', '_Closed'))
df_result['Timedelta'] = df_result['Timestamp_Closed'] - df_result['Timestamp_Opened']

df_result

Output:

     Timestamp_Opened App_Name_Opened Event_Type_Opened  g    Timestamp_Closed App_Name_Closed Event_Type_Closed       Timedelta
0 2018-01-16 06:01:05       Instagram            Opened  1 2018-01-16 06:01:06       Instagram            Closed 0 days 00:00:01
1 2018-01-16 06:01:07       Instagram            Opened  2 2018-01-16 06:02:08       Instagram            Closed 0 days 00:01:01
2 2018-01-16 06:01:09       Instagram            Opened  5 2018-01-16 06:01:09       Instagram            Closed 0 days 00:00:00
3 2018-01-16 06:03:44       Instagram            Opened  6 2018-01-16 06:03:44       Instagram            Closed 0 days 00:00:00
like image 178
Scott Boston Avatar answered Jun 18 '26 03:06

Scott Boston


Try:

out, state = [], None
for i, e in zip(df["Id"], df["Event_Type"]):
    if e == "Opened":
        state = i
    elif e == "Closed" and state is not None:
        out.append([state, i])
        state = None

print(out)

Prints:

[[1, 2], [3, 6], [9, 10], [11, 12]]

To get time differences:

df["Timestamp"] = pd.to_datetime(df["Timestamp"])

out, state = [], None
for i, e in zip(df.index, df["Event_Type"]):
    if e == "Opened":
        state = i
    elif e == "Closed" and state is not None:
        out.append(df.loc[i, "Timestamp"] - df.loc[state, "Timestamp"])
        state = None

print(out)

Prints:

[Timedelta('0 days 00:00:01'), Timedelta('0 days 00:01:01'), Timedelta('0 days 00:00:00'), Timedelta('0 days 00:00:00')]
like image 22
Andrej Kesely Avatar answered Jun 18 '26 02:06

Andrej Kesely