i have a pandas data frame
id tag
1 A
1 A
1 B
1 C
1 A
2 B
2 C
2 B
I want to add a column which computes the cumulative number of unique tags over at id level. More specifically, I would like to have
id tag count
1 A 1
1 A 1
1 B 2
1 C 3
1 A 3
2 B 1
2 C 2
2 B 2
For a given id, count will be non-decreasing. Thanks for your help!
I think this does what you want:
unique_count = df.drop_duplicates().groupby('id').cumcount() + 1
unique_count.reindex(df.index).ffill()
The +1 is because the count starts at zero. This only works if the dataframe is sorted by id. Was that intended? You can always sort beforehand.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With