Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby cumcount - one cumulative count rather than a cumulative count for each unique value

Let's say I have a df

pd.DataFrame(
    {'name':['pam','pam','bob','bob','pam','bob','pam','bob'],
     'game_id':[0,0,1,1,0,2,1,2]
     }
    )


   name game_id
0   pam 0
1   pam 0
2   bob 1
3   bob 1
4   pam 0
5   bob 2
6   pam 1
7   bob 2

I want to calculate how many games bob and amy have appeared in cumulatively. However, when I use .groupby() and .cumcount()+1, I get something different. I get a cumulative count within each game_id:

df['games'] = df.groupby(['name','game_id']).cumcount()+1

    name game_id games
0   pam 0   1
1   pam 0   2
2   bob 1   1
3   bob 1   2
4   pam 0   3
5   bob 2   1
6   pam 1   1
7   bob 2   2

When what I really want is a one total cumulative count rather than a cumulative count for each unique game_id. Here's an example of my desired output:

    name game_id games
0   pam 0   1
1   pam 0   1
2   bob 1   1
3   bob 1   1
4   pam 0   1
5   bob 2   2
6   pam 1   2
7   bob 2   2

Note, in my actual dataset game_id is a random sequence of numbers.

like image 897
bismo Avatar asked Oct 19 '25 04:10

bismo


1 Answers

One line alternative.

The two essential steps performed are:

  1. Determine unique rows by taking the inverse of df.duplicated
  2. groupby.cumsum on step #1 column to get the cumulative unique count
df['games'] = df.assign(temp=~df.duplicated(subset=['name','game_id'])).groupby('name')['temp'].cumsum()

  name  game_id  games
0  pam        0      1
1  pam        0      1
2  bob        1      1
3  bob        1      1
4  pam        0      1
5  bob        2      2
6  pam        1      2
7  bob        2      2
like image 159
StevenS Avatar answered Oct 21 '25 17:10

StevenS



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!