How to calculate cumulative groupby counts in Pandas with point in time?

Tags:

I have a df that contains multiple weekly snapshots of JIRA tickets. I want to calculate the YTD counts of tickets.

the df looks like this:

pointInTime   ticketId
2008-01-01         111
2008-01-01         222
2008-01-01         333
2008-01-07         444
2008-01-07         555
2008-01-07         666
2008-01-14         777
2008-01-14         888
2008-01-14         999

So if I df.groupby(['pointInTime'])['ticketId'].count() I can get the count of Ids in every snaphsots. But what I want to achieve is calculate the cumulative sum.

and have a df looks like this:

pointInTime   ticketId   cumCount
2008-01-01         111   3
2008-01-01         222   3
2008-01-01         333   3
2008-01-07         444   6
2008-01-07         555   6
2008-01-07         666   6
2008-01-14         777   9
2008-01-14         888   9
2008-01-14         999   9

so for 2008-01-07 number of ticket would be count of 2008-01-07 + count of 2008-01-01.

825

asked Jun 18 '19 14:06

bossangelo

3 Answers

Use GroupBy.count and cumsum, then map the result back to "pointInTime":

df['cumCount'] = (
    df['pointInTime'].map(df.groupby('pointInTime')['ticketId'].count().cumsum()))
df

  pointInTime  ticketId  cumCount
0  2008-01-01       111         3
1  2008-01-01       222         3
2  2008-01-01       333         3
3  2008-01-07       444         6
4  2008-01-07       555         6
5  2008-01-07       666         6
6  2008-01-14       777         9
7  2008-01-14       888         9
8  2008-01-14       999         9

194

answered Nov 15 '22 04:11

cs95

I am using value_counts

df.pointInTime.map(df.pointInTime.value_counts().sort_index().cumsum())
Out[207]: 
0    3
1    3
2    3
3    6
4    6
5    6
6    9
7    9
8    9
Name: pointInTime, dtype: int64

pd.Series(np.arange(len(df))+1,index=df.index).groupby(df['pointInTime']).transform('last')
Out[216]: 
0    3
1    3
2    3
3    6
4    6
5    6
6    9
7    9
8    9
dtype: int32

answered Nov 15 '22 04:11

BENY

Here's an approach transforming with the size and multiplying by the result of taking pd.factorize on pointInTime:

df['cumCount'] = (df.groupby('pointInTime').ticketId
                    .transform('size')
                    .mul(pd.factorize(df.pointInTime)[0]+1))

 pointInTime  ticketId  cumCount
0  2008-01-01       111         3
1  2008-01-01       222         3
2  2008-01-01       333         3
3  2008-01-07       444         6
4  2008-01-07       555         6
5  2008-01-07       666         6
6  2008-01-14       777         9
7  2008-01-14       888         9
8  2008-01-14       999         9

answered Nov 15 '22 02:11

yatu

Related questions
                            
                                How do you "clear" only specific Flask session variables?
                            
                                how to add multiple autocomplete in django admin page
                            
                                Filter pandas row where 1st letter in a column is/is-not a certain value
                            
                                What is the meaning of hash if we still need to check every item?
                            
                                Diagonal snake filling array
                            
                                How to find first value in a list having no duplicates?
                            
                                Best way to get (millions of rows of) data into Janusgraph via Tinkerpop, with a specific model
                            
                                Python: Reversibly encode alphanumeric string to integer
                            
                                Does python threading.Lock() lock everything that needs locking?
                            
                                Python download large csv file from a url line by line for only 10 entries
                            
                                How to install Numpy and Pandas for AWS Lambdas?
                            
                                How to connect to Tor browser using Python
                            
                                Matching two people together based on attributes
                            
                                how to install pip for python 2.7?
                            
                                Pycharm - Disable 'Local variable 'xxx' might be referenced before assignment'
                            
                                How to make section in PyCharm
                            
                                Getting full path from the relative path with Argparse
                            
                                Generate a matrix without 0 values in a random way
                            
                                Python, How to pass function to a class method as argument
                            
                                How to use code formatter Black with Spyder?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate cumulative groupby counts in Pandas with point in time?

Tags:

python

pandas

dataframe

pandas-groupby