Let's say I have a df
pd.DataFrame(
{'name':['pam','pam','bob','bob','pam','bob','pam','bob'],
'game_id':[0,0,1,1,0,2,1,2]
}
)
name game_id
0 pam 0
1 pam 0
2 bob 1
3 bob 1
4 pam 0
5 bob 2
6 pam 1
7 bob 2
I want to calculate how many games bob and amy have appeared in cumulatively. However, when I use .groupby()
and .cumcount()+1
, I get something different. I get a cumulative count within each game_id
:
df['games'] = df.groupby(['name','game_id']).cumcount()+1
name game_id games
0 pam 0 1
1 pam 0 2
2 bob 1 1
3 bob 1 2
4 pam 0 3
5 bob 2 1
6 pam 1 1
7 bob 2 2
When what I really want is a one total cumulative count rather than a cumulative count for each unique game_id
. Here's an example of my desired output:
name game_id games
0 pam 0 1
1 pam 0 1
2 bob 1 1
3 bob 1 1
4 pam 0 1
5 bob 2 2
6 pam 1 2
7 bob 2 2
Note, in my actual dataset game_id
is a random sequence of numbers.
One line alternative.
The two essential steps performed are:
df.duplicated
groupby.cumsum
on step #1 column to get the cumulative unique countdf['games'] = df.assign(temp=~df.duplicated(subset=['name','game_id'])).groupby('name')['temp'].cumsum()
name game_id games
0 pam 0 1
1 pam 0 1
2 bob 1 1
3 bob 1 1
4 pam 0 1
5 bob 2 2
6 pam 1 2
7 bob 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With