How to perform a cumulative sum of distinct values in pandas dataframe

Tags:

I have a dataframe like this:

id    date         company    ......
123   2019-01-01        A
224   2019-01-01        B
345   2019-01-01        B
987   2019-01-03        C
334   2019-01-03        C
908   2019-01-04        C
765   2019-01-04        A
554   2019-01-05        A
482   2019-01-05        D

and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again.

My expected output is:

date            cumulative_count
2019-01-01      2
2019-01-03      3
2019-01-04      3
2019-01-05      4

I've tried:

df.groupby(['date']).company.nunique().cumsum()

but this double counts if the same company appears on a different date.

374

asked Sep 05 '19 14:09

daragh

1 Answers

Using duplicated + cumsum + last

m = df.duplicated('company')
d = df['date']

(~m).cumsum().groupby(d).last()

date
2019-01-01    2
2019-01-03    3
2019-01-04    3
2019-01-05    4
dtype: int32

141

answered Sep 28 '22 14:09

user3483203

Related questions
                            
                                mypy: "__eq__" incompatible with supertype "object"
                            
                                Add user profile to request.user
                            
                                Python: how to embed all docstring help at package level help menu?
                            
                                Future raising TypeError after wait
                            
                                Maximum sum of subsequence of length L with a restriction
                            
                                What are all the types available in Cython?
                            
                                How can I configure "HTTPS" schemes with the drf-yasg auto-generated swagger page?
                            
                                Seaborn Catplot set values over the bars
                            
                                How "download_slot" works within scrapy
                            
                                Add item to existing Matplotlib legend
                            
                                TypeError: descriptor '__subclasses__' of 'type' object needs an argument while trying to import rasa_core
                            
                                Resource Aquisition Is Initialization, in Python
                            
                                Calculate recall for each class after each epoch in Tensorflow 2
                            
                                Does Python logging write to stdout or stderr by default?
                            
                                Flask: How to use ES6 modules?
                            
                                subplots with plotly express 4
                            
                                2D X-ray reconstruction from 3D DICOM images
                            
                                Fast, python-ish way of ranking chunks of 1's in numpy array?
                            
                                TypeError: cannot unpack non-iterable int objec
                            
                                'continue' the 'for' loop to the previous element

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to perform a cumulative sum of distinct values in pandas dataframe

Tags:

python

datetime

pandas

dataframe

pandas-groupby

daragh

People also ask

1 Answers

user3483203

Recent Activity

Donate For Us