I'm trying to find the best way to approach a simple time gap analysis with Python and Pandas. This is just for fun, so feel free to explain answers to help me learn more.
I started off by generating some random data. First is dates with a time stamp frequency of 20 seconds over the total periods for 4 sessions across 2 users. The users are 123 and 345.
The simulation here is a snapshot every 20 seconds to see if the user is online or not.
import pandas as pd
session_one = pd.date_range('2016-01-01', periods=100, freq='20S')
session_two = pd.date_range('2016-02-01', periods=75, freq='20S')
session_three = pd.date_range('2016-01-01', periods=125, freq='20S')
session_four = pd.date_range('2016-02-01', periods=25, freq='20S')
user_one = [session_one, session_two]
user_two = [session_three, session_four]
data = []
for sessions in user_one:
for dates in sessions:
data.append([123,dates])
for sessions in user_two:
for dates in sessions:
data.append([345,dates])
# Make our dataframe with our randomly generated data
df = pd.DataFrame(data=data, columns=['ID', 'Timestamp'])
Trying To Achieve
I want to measure the time gap between each user record and append it back on the record.
SQL Approach
I have a good SQL approach, but can't seem to replicate joining datasets ontop of each other and offsetting the times correctly with Pandas. For example, doing a Pandas merge (join) like such:
df['Timestamp'] + datetime.timedelta(0,20)
I think you need groupby
by each UserID
with diff
:
df['diff'] = df.groupby('UserID')['Timestamp'].diff()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With