I am trying to calculate the average time difference, in hours/minutes/seconds, iterating on a field - in my example, for each different ip address. Moreover, a column containing the count of each ip row.
My dataframe looks like:
date ipAddress
2016-08-08 00:39:00 98.249.244.22
2016-08-08 13:03:00 98.249.244.22
2016-08-20 21:37:00 98.211.135.179
2016-08-21 16:11:00 98.211.135.179
2016-08-21 16:19:00 98.211.135.179
2016-08-25 01:30:00 98.248.215.244
My desired output:
ipAddress avg_time_diff count
98.249.244.22 avg_diff_1 2
98.211.135.179 avg_diff_2 3
98.248.215.244 0 1
A reproducicle df:
{u'date': {3233: Timestamp('2016-08-08 00:39:00'),
3551: Timestamp('2016-08-08 13:03:00'),
349036: Timestamp('2016-08-20 21:37:00'),
349040: Timestamp('2016-08-21 16:11:00'),
349049: Timestamp('2016-08-21 16:19:00'),
378843: Timestamp('2016-08-25 01:30:00')},
u'ipAddress': {3233: u'98.249.244.22',
3551: u'98.249.244.22',
49036: u'98.211.135.179',
349040: u'98.211.135.179',
349049: u'98.211.135.179',
378843: u'98.248.215.244'}}
I have no clue where to start, it tried timediff but I am not sure I've understood how it works and how to iterate over rows as a "window function". Thanks in advance
To get a time difference in seconds, use the timedelta. total_seconds() methods. Multiply the total seconds by 1000 to get the time difference in milliseconds. Divide the seconds by 60 to get the difference in minutes.
See applying different functions to dataframe columns:
(df.groupby('ipAddress')
.date
.agg({'count': 'count',
'avg_time_diff': lambda group: group.sort_values().diff().mean()}))
# Output
# count avg_time_diff
# ipAddress
# 98.211.135.179 2 00:08:00
# 98.248.215.244 1 NaT
# 98.249.244.22 2 12:24:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With