Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cumulative number of unique elements for pandas dataframe

i have a pandas data frame

id tag
1  A
1  A
1  B
1  C
1  A
2  B
2  C  
2  B 

I want to add a column which computes the cumulative number of unique tags over at id level. More specifically, I would like to have

id tag count
1  A   1
1  A   1
1  B   2
1  C   3
1  A   3
2  B   1
2  C   2
2  B   2

For a given id, count will be non-decreasing. Thanks for your help!

like image 869
user42361 Avatar asked Dec 24 '22 16:12

user42361


1 Answers

I think this does what you want:

unique_count = df.drop_duplicates().groupby('id').cumcount() + 1
unique_count.reindex(df.index).ffill()

The +1 is because the count starts at zero. This only works if the dataframe is sorted by id. Was that intended? You can always sort beforehand.

like image 189
JoeCondron Avatar answered Feb 09 '23 01:02

JoeCondron