Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The best way to perform time gap analysis with Pandas?

I'm trying to find the best way to approach a simple time gap analysis with Python and Pandas. This is just for fun, so feel free to explain answers to help me learn more.

I started off by generating some random data. First is dates with a time stamp frequency of 20 seconds over the total periods for 4 sessions across 2 users. The users are 123 and 345.

The simulation here is a snapshot every 20 seconds to see if the user is online or not.

import pandas as pd

session_one = pd.date_range('2016-01-01', periods=100, freq='20S')
session_two = pd.date_range('2016-02-01', periods=75, freq='20S')
session_three = pd.date_range('2016-01-01', periods=125, freq='20S')
session_four = pd.date_range('2016-02-01', periods=25, freq='20S')

user_one = [session_one, session_two]
user_two = [session_three, session_four]

data = []

for sessions in user_one:
    for dates in sessions:
        data.append([123,dates])

for sessions in user_two:
    for dates in sessions:
        data.append([345,dates])

# Make our dataframe with our randomly generated data
df = pd.DataFrame(data=data, columns=['ID', 'Timestamp'])

Trying To Achieve

I want to measure the time gap between each user record and append it back on the record.

SQL Approach

I have a good SQL approach, but can't seem to replicate joining datasets ontop of each other and offsetting the times correctly with Pandas. For example, doing a Pandas merge (join) like such:

df['Timestamp'] + datetime.timedelta(0,20)
like image 698
Fastidious Avatar asked Sep 02 '25 02:09

Fastidious


1 Answers

I think you need groupby by each UserID with diff:

df['diff'] = df.groupby('UserID')['Timestamp'].diff()
like image 55
jezrael Avatar answered Sep 05 '25 08:09

jezrael