I have a dataframe of app usage in 4 columns that looks like this:
Id Timestamp App_Name Event_Type
1 2018/01/16 06:01:05 Instagram Opened
2 2018/01/16 06:01:06 Instagram Closed
3 2018/01/16 06:01:07 Instagram Opened
4 2018/01/16 06:01:08 Instagram Interaction
5 2018/01/16 06:01:09 Instagram Interaction
6 2018/01/16 06:02:08 Instagram Closed
7 2018/01/16 06:01:08 Instagram Opened
8 2018/01/16 06:01:08 Instagram Opened
9 2018/01/16 06:01:09 Instagram Opened
10 2018/01/16 06:01:09 Instagram Closed
11 2018/01/16 06:03:44 Instagram Opened
12 2018/01/16 06:03:44 Instagram Closed
13 2018/01/16 06:03:45 Instagram Closed
14 2018/01/16 06:03:45 Instagram Closed
15 2018/01/16 06:03:47 Instagram Opened
I want to get time difference in seconds between each pair of 'Opened' followed by 'Closed' rows regardless of whether or not there are other 'Event_Types' between them. There can be errors where there is more than one consecutive open or close. I just want difference between last open and first close. So in this case I want time differences between:
How do I do that?
Thanks!
Here's another more involved way, lineup the records side-by-side and subtracting timestamp columns.
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.set_index('Id')
df['g'] = (df['Event_Type'] == 'Opened').cumsum()
df_open = df.query('Event_Type == "Opened"').groupby('g').head(1)
df_close = df.query('Event_Type == "Closed"').groupby('g').head(1)
df_result = df_open.merge(df_close, on='g', suffixes=('_Opened', '_Closed'))
df_result['Timedelta'] = df_result['Timestamp_Closed'] - df_result['Timestamp_Opened']
df_result
Output:
Timestamp_Opened App_Name_Opened Event_Type_Opened g Timestamp_Closed App_Name_Closed Event_Type_Closed Timedelta
0 2018-01-16 06:01:05 Instagram Opened 1 2018-01-16 06:01:06 Instagram Closed 0 days 00:00:01
1 2018-01-16 06:01:07 Instagram Opened 2 2018-01-16 06:02:08 Instagram Closed 0 days 00:01:01
2 2018-01-16 06:01:09 Instagram Opened 5 2018-01-16 06:01:09 Instagram Closed 0 days 00:00:00
3 2018-01-16 06:03:44 Instagram Opened 6 2018-01-16 06:03:44 Instagram Closed 0 days 00:00:00
Try:
out, state = [], None
for i, e in zip(df["Id"], df["Event_Type"]):
if e == "Opened":
state = i
elif e == "Closed" and state is not None:
out.append([state, i])
state = None
print(out)
Prints:
[[1, 2], [3, 6], [9, 10], [11, 12]]
To get time differences:
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
out, state = [], None
for i, e in zip(df.index, df["Event_Type"]):
if e == "Opened":
state = i
elif e == "Closed" and state is not None:
out.append(df.loc[i, "Timestamp"] - df.loc[state, "Timestamp"])
state = None
print(out)
Prints:
[Timedelta('0 days 00:00:01'), Timedelta('0 days 00:01:01'), Timedelta('0 days 00:00:00'), Timedelta('0 days 00:00:00')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With