Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby conditional sum of adjacent rows pandas

I have a dataframe, which has been sorted by user and by time

 df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
              'location' : ['house','house','gym','gym','shop','gym','gym'], 
              'duration':[10,5,5,4,10,4,6]})


   duration location user
0        10    house    A
1         5    house    A
2         5      gym    A
3         4      gym    B
4        10     shop    B
5         4      gym    B
6         6      gym    B

I only want to do the sum() when 'location' fields are the same across adjacent rows for a given user. So it is not just df.groupby(['id','location']).duration.sum(). The desired output will look like the following. In addition, the order is important.

duration location user
      15    house    A
       5      gym    A
       4      gym    B
      10     shop    B
      10      gym    B

Thank you!

like image 529
user42361 Avatar asked Jan 12 '17 18:01

user42361


Video Answer


1 Answers

Supply sort=False to preserve the ordering between groups like it appeared in the original DF. Then, compute the grouped sum of duration column.

adj_check = (df.location != df.location.shift()).cumsum()
df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()

enter image description here


The only change that needs to be made to what you've tried before is this condition which groups all the similar successive rows into one unique group:

(df.location != df.location.shift()).cumsum()
0    1
1    1
2    2
3    2
4    3
5    4
6    4
Name: location, dtype: int32
like image 163
Nickil Maveli Avatar answered Oct 29 '22 01:10

Nickil Maveli