Pandas: GroupBy Shift And Cumulative Sum

Tags:

I want to do groupby, shift and cumsum which seems pretty trivial task but still banging my head over the result I'm getting. Can someone please tell what am I doing wrong. All the results I found online shows the same or the same variation of what I am doing. Below is my implementation.

temp = pd.DataFrame(data=[['a',1],['a',1],['a',1],['b',1],['b',1],['b',1],['c',1],['c',1]], columns=['ID','X'])

temp['transformed'] = temp.groupby('ID')['X'].cumsum().shift()
print(temp)

   ID   X   transformed
0   a   1   NaN
1   a   1   1.0
2   a   1   2.0
3   b   1   3.0
4   b   1   1.0
5   b   1   2.0
6   c   1   3.0
7   c   1   1.0

This is wrong because the actual or what I am looking for is as below:

   ID   X   transformed
0   a   1   NaN
1   a   1   1.0
2   a   1   2.0
3   b   1   NaN
4   b   1   1.0
5   b   1   2.0
6   c   1   NaN
7   c   1   1.0

Thanks a lot in advance.

490

asked Mar 04 '19 23:03

Krishnang K Dalal

2 Answers

While working on this problem, as the DataFrame size grows, using lambdas on transform starts to get very slow. I found out that using some DataFrameGroupBy methods (like cumsum and shift instead of lambdas are much faster.

So here's my proposed solution, creating a 'temp' column to save the cumsum for each ID and then shifting in a different groupby:

df['temp'] = df.groupby("ID")['X'].cumsum()
df['transformed'] = df.groupby("ID")['temp'].shift()
df = df.drop(columns=["temp"])

125

answered Sep 18 '22 11:09

Kazu

You could use transform() to feed the separate groups that are created at each level of groupby into the cumsum() and shift() methods.

temp['transformed'] = \
    temp.groupby('ID')['X'].transform(lambda x: x.cumsum().shift())

  ID  X   transformed
0  a  1   NaN
1  a  1   1.0
2  a  1   2.0
3  b  1   NaN
4  b  1   1.0
5  b  1   2.0
6  c  1   NaN
7  c  1   1.0

For more info on transform() please see here:

https://jakevdp.github.io/PythonDataScienceHandbook/03.08-aggregation-and-grouping.html#Transformation
https://pandas.pydata.org/pandas-docs/version/0.22/groupby.html#transformation

answered Sep 17 '22 11:09

leerssej

Related questions
                            
                                Confused about the choice between Python 2 vs Python 3 [closed]
                            
                                Using __class__ to create instances
                            
                                Python 3 replacement for PyFile_AsFile
                            
                                Deleting multiple indexes from a list at once - python [duplicate]
                            
                                Getting "IOError: [Errno 13] Permission denied:.." when importing pandas.DataFrame
                            
                                ImportError: No module names 'matplotlib' Python 3.3
                            
                                Python3 bytes to hex string
                            
                                How to download files with Box API & Python
                            
                                Reopening a closed stringIO object in Python 3
                            
                                Is it possible to know the maximum number accepted by chr using Python?
                            
                                How to open a file only using its extension?
                            
                                Pygame: how to change background color [duplicate]
                            
                                truncated normal distribution with scipy in python
                            
                                How to get Python pandas DataFrame from string written by print()?
                            
                                Pandas unable to open this Excel file
                            
                                functional difference between lookarounds and non-capture group?
                            
                                pandas how to find continuous values in a series whose differences are within a certain distance
                            
                                How to iterate over an asynchronous iterator with a timeout?
                            
                                ModuleNotFoundError: No module named 'pandas.core.indexes'
                            
                                How to use asynchronous generator in Python 3.6?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: GroupBy Shift And Cumulative Sum

Tags:

python-3.x

pandas

pandas-groupby

Krishnang K Dalal

People also ask

2 Answers

Kazu

leerssej

Recent Activity

Donate For Us