Groupby conditional sum of adjacent rows pandas

Question

I have a dataframe, which has been sorted by user and by time

 df = pd.DataFrame({'user' : ['A', 'A', 'A', 'B', 'B', 'B','B'],
              'location' : ['house','house','gym','gym','shop','gym','gym'], 
              'duration':[10,5,5,4,10,4,6]})


   duration location user
0        10    house    A
1         5    house    A
2         5      gym    A
3         4      gym    B
4        10     shop    B
5         4      gym    B
6         6      gym    B

I only want to do the sum() when 'location' fields are the same across adjacent rows for a given user. So it is not just df.groupby(['id','location']).duration.sum(). The desired output will look like the following. In addition, the order is important.

duration location user
      15    house    A
       5      gym    A
       4      gym    B
      10     shop    B
      10      gym    B

Thank you!

Nickil Maveli · Accepted Answer

Supply sort=False to preserve the ordering between groups like it appeared in the original DF. Then, compute the grouped sum of duration column.

adj_check = (df.location != df.location.shift()).cumsum()
df.groupby(['user', 'location', adj_check], as_index=False, sort=False)['duration'].sum()

enter image description here

The only change that needs to be made to what you've tried before is this condition which groups all the similar successive rows into one unique group:

(df.location != df.location.shift()).cumsum()
0    1
1    1
2    2
3    2
4    3
5    4
6    4
Name: location, dtype: int32

Groupby conditional sum of adjacent rows pandas

Tags:

python

pandas

conditional-statements

user42361

Video Answer

1 Answers

Nickil Maveli

Recent Activity

Donate For Us

Groupby conditional sum of adjacent rows pandas

Tags:

python

pandas

conditional-statements

user42361

Video Answer

1 Answers

Nickil Maveli

Related questions

Recent Activity

Donate For Us