Consider the following dataframe:
>>> import pandas as pd
>>> df = pd.DataFrame({'group': list('aaabbabc')})
>>> df
group
0 a
1 a
2 a
3 b
4 b
5 a
6 b
7 c
I want to count the cumulative number of times each group has occurred. My desired output looks like this:
>>> df
group n
0 a 0
1 a 1
2 a 2
3 b 0
4 b 1
5 a 3
6 b 2
7 c 0
My initial approach was to do something like this:
df['n'] = df.groupby('group').apply(lambda x: list(range(x.shape[0])))
Basically assigning a length n array, zero-indexed, to each group. But that has proven difficult to transpose and join.
You can use groupby
+ cumcount
, and horizontally concat
the new column:
>>> pd.concat([df, df.group.groupby(df.group).cumcount()], axis=1).rename(columns={0: 'n'})
group n
0 a 0
1 a 1
2 a 2
3 b 0
4 b 1
5 a 3
6 b 2
7 c 0
Simply use groupby
on column name, in this case group
and then apply cumcount
and finally add a column in dataframe with the result.
df['n']=df.groupby('group').cumcount()
group n
0 a 0
1 a 1
2 a 2
3 b 0
4 b 1
5 a 3
6 b 2
7 c 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With