Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform a cumulative sum of distinct values in pandas dataframe

I have a dataframe like this:

id    date         company    ......
123   2019-01-01        A
224   2019-01-01        B
345   2019-01-01        B
987   2019-01-03        C
334   2019-01-03        C
908   2019-01-04        C
765   2019-01-04        A
554   2019-01-05        A
482   2019-01-05        D

and I want to get the cumulative number of unique values over time for the 'company' column. So if a company appears at a later date they are not counted again.

My expected output is:

date            cumulative_count
2019-01-01      2
2019-01-03      3
2019-01-04      3
2019-01-05      4

I've tried:

df.groupby(['date']).company.nunique().cumsum()

but this double counts if the same company appears on a different date.

like image 374
daragh Avatar asked Sep 05 '19 14:09

daragh


People also ask

How do you do a cumulative sum in pandas?

Pandas Series: cumsum() function The cumsum() function is used to get cumulative sum over a DataFrame or Series axis. Returns a DataFrame or Series of the same size containing the cumulative sum. The index or the name of the axis. 0 is equivalent to None or 'index'.

How do you do a cumulative sum list in Python?

We declare an empty list cum_list to which we will append elements to form the cumulative sum list. Initialize a sum variable sm=0. Start iterating over the input list, with each iteration we increment the sum value to previous value+ the current element. On each iteration, the sum value is appended to the cum_list.

How do I sum specific values in pandas?

sum() function returns the sum of the values for the requested axis. Parameters: axis : {index (0), columns (1)}

How do you count distinct in a DataFrame?

You can use the nunique() function to count the number of unique values in a pandas DataFrame.


1 Answers

Using duplicated + cumsum + last

m = df.duplicated('company')
d = df['date']

(~m).cumsum().groupby(d).last()

date
2019-01-01    2
2019-01-03    3
2019-01-04    3
2019-01-05    4
dtype: int32
like image 141
user3483203 Avatar answered Sep 28 '22 14:09

user3483203