I have a dataframe, which has been sorted by user and by time
df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
'location' : ['house','house','gym','gym','shop','gym','gym'],
'duration':[10,5,5,4,10,4,6]})
duration location user
0 10 house A
1 5 house A
2 5 gym A
3 4 gym B
4 10 shop B
5 4 gym B
6 6 gym B
I only want to do the sum()
when 'location' fields are the same across adjacent rows for a given user. So it is not just df.groupby(['id','location']).duration.sum()
. The desired output will look like the following. In addition, the order is important.
duration location user
15 house A
5 gym A
4 gym B
10 shop B
10 gym B
Thank you!
Supply sort=False
to preserve the ordering between groups like it appeared in the original DF
. Then, compute the grouped sum of duration column.
adj_check = (df.location != df.location.shift()).cumsum()
df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()
The only change that needs to be made to what you've tried before is this condition which groups all the similar successive rows into one unique group:
(df.location != df.location.shift()).cumsum()
0 1
1 1
2 2
3 2
4 3
5 4
6 4
Name: location, dtype: int32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With