I have a large dataset (roughly 250k rows) with property ID numbers and survey response datetimes. I'm trying to build some kind of basic fraud detection and one of the factors I'm considering is response time between different surveys for each property/store. If there is less than 10 minutes between any surveys for a particular location, I want it flagged as TRUE under a column called "close_response".
I have nested for loops within a function that is applied to each group that achieves what I want, but it's computationally expensive and time consuming when applied to a very large dataset.
import pandas as pd
from datetime import timedelta
# Sample DataFrame
data = {'property_id': [1, 1, 1, 2, 2, 3, 3],
'response_datetime': ['2023-10-19 08:00:00', '2023-10-19 08:05:00', '2023-10-19 08:25:00', '2023-10-19 09:00:00', '2023-10-19 09:15:00', '2023-10-19 10:00:00', '2023-10-19 11:00:00']}
df = pd.DataFrame(data)
df['response_datetime'] = pd.to_datetime(df['response_datetime'])
def mark_close_responses(group):
group['close_response'] = False
for i in range(len(group)):
for j in range(i+1, len(group)):
time_diff = group.iloc[j]['response_datetime'] - group.iloc[i]['response_datetime']
if time_diff <= timedelta(minutes=10):
group.at[i, 'close_response'] = True
group.at[j, 'close_response'] = True
return group
df = df.groupby('property_id').apply(mark_close_responses)
Compute the difference between successive rows using diff
, get rows from the difference that are less than or equal to 10 minutes, use a conditional to get accurate positions per group, back fill and assign back to the original dataframe:
difference=df.groupby('property_id').response_datetime.diff()
out = difference.le(pd.Timedelta(minutes=10))
out = difference.where(difference.isna(), out).bfill()
df.assign(close_response=out)
property_id response_datetime close_response
0 1 2023-10-19 08:00:00 True
1 1 2023-10-19 08:05:00 True
2 1 2023-10-19 08:25:00 False
3 2 2023-10-19 09:00:00 False
4 2 2023-10-19 09:15:00 False
5 3 2023-10-19 10:00:00 False
6 3 2023-10-19 11:00:00 False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With