cumulative number of unique elements for pandas dataframe

Question

i have a pandas data frame

id tag
1  A
1  A
1  B
1  C
1  A
2  B
2  C  
2  B

I want to add a column which computes the cumulative number of unique tags over at id level. More specifically, I would like to have

id tag count
1  A   1
1  A   1
1  B   2
1  C   3
1  A   3
2  B   1
2  C   2
2  B   2

For a given id, count will be non-decreasing. Thanks for your help!

JoeCondron · Accepted Answer

I think this does what you want:

unique_count = df.drop_duplicates().groupby('id').cumcount() + 1
unique_count.reindex(df.index).ffill()

The +1 is because the count starts at zero. This only works if the dataframe is sorted by id. Was that intended? You can always sort beforehand.

Donate For Us