Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Start counting at zero by group

Tags:

python

pandas

Consider the following dataframe:

>>> import pandas as pd
>>> df = pd.DataFrame({'group': list('aaabbabc')})
>>> df
  group
0     a
1     a
2     a
3     b
4     b
5     a
6     b
7     c

I want to count the cumulative number of times each group has occurred. My desired output looks like this:

>>> df
  group  n
0     a  0
1     a  1
2     a  2
3     b  0
4     b  1
5     a  3
6     b  2
7     c  0

My initial approach was to do something like this:

df['n'] = df.groupby('group').apply(lambda x: list(range(x.shape[0])))

Basically assigning a length n array, zero-indexed, to each group. But that has proven difficult to transpose and join.

like image 476
Carter Masterson Avatar asked Jan 03 '23 10:01

Carter Masterson


2 Answers

You can use groupby + cumcount, and horizontally concat the new column:

>>> pd.concat([df, df.group.groupby(df.group).cumcount()], axis=1).rename(columns={0: 'n'})
    group   n
0   a   0
1   a   1
2   a   2
3   b   0
4   b   1
5   a   3
6   b   2
7   c   0
like image 120
Ami Tavory Avatar answered Jan 05 '23 00:01

Ami Tavory


Simply use groupby on column name, in this case group and then apply cumcount and finally add a column in dataframe with the result.

df['n']=df.groupby('group').cumcount()

 group  n
0   a   0
1   a   1
2   a   2
3   b   0
4   b   1
5   a   3
6   b   2
7   c   0
like image 31
Pankaj Avatar answered Jan 04 '23 23:01

Pankaj