I have data
i,ID,url,used_at,active_seconds,domain,search_term
322015,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-12-31 09:16:05,35,vk.com,None
838267,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed,2015-12-31 09:16:38,54,vk.com,None
838271,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:17:32,34,vk.com,None
322026,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos&z=photo143297356_397216312%2Ffeed1_143297356_1451504298,2015-12-31 09:18:06,4,vk.com,None
838275,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:18:10,4,vk.com,None
322028,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=comments,2015-12-31 09:18:14,8,vk.com,None
322029,0120bc30e78ba5582617a9f3d6dfd8ca,megarand.ru/contest/121070,2015-12-31 09:18:22,16,megarand.ru,None
1870917,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=comments,2015-12-31 09:18:38,6,vk.com,None
1354612,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-12-31 09:18:44,56,vk.com,None
I need to group by ID
, and next groupby used_at
, where difference between 2 strings more than 500 second
.
I try
df.groupby([df['ID', 'used_at'],pd.TimeGrouper(freq='5Min')])
But it returns KeyError: ('ID', 'used_at')
IIUC you need:
print (df.groupby('ID')['used_at'].diff().dt.seconds)
0 NaN
1 33.0
2 54.0
3 34.0
4 4.0
5 4.0
6 8.0
7 16.0
8 6.0
Name: used_at, dtype: float64
If you wish to use TimeGrouper
, you should first set a Datetimeindex
and then you can use any aggregating function - e.g. sum
:
df['used_at'] = pd.to_datetime(df.used_at)
df.set_index('used_at', inplace=True)
print (df.groupby([df['ID'],pd.TimeGrouper(freq='5Min')]).sum())
Another way to do it is to copy the column used_at
to index
:
df['used_at'] = pd.to_datetime(df.used_at)
df.set_index(df['used_at'], inplace=True)
print (df.groupby([df['ID'], df['used_at'],pd.TimeGrouper(freq='5Min')]).sum())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With