Python - rolling functions for GroupBy object

People also ask

How do you use the roll function in Python?

rolling() function to find the rolling window sum of the underlying data for the given Series object. The size of the rolling window should be 2 and the rolling window type should be 'triang'. Output : Now we will use Series.

How do I turn a Groupby object into a list?

groupby() To Group Rows into List. By using DataFrame. gropby() function you can group rows on a column, select the column you want as a list from the grouped result and finally convert it to a list for each group using apply(list).

Can you sort a Groupby object?

Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.

For the Googlers who come upon this old question:

Regarding @kekert's comment on @Garrett's answer to use the new

df.groupby('id')['x'].rolling(2).mean()

rather than the now-deprecated

df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)

curiously, it seems that the new .rolling().mean() approach returns a multi-indexed series, indexed by the group_by column first and then the index. Whereas, the old approach would simply return a series indexed singularly by the original df index, which perhaps makes less sense, but made it very convenient for adding that series as a new column into the original dataframe.

So I think I've figured out a solution that uses the new rolling() method and still works the same:

df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

which should give you the series

which you can add as a column:

df['x'] = df.groupby('id')['x'].rolling(2).mean().reset_index(0,drop=True)

cumulative sum

To answer the question directly, the cumsum method would produced the desired series:

In [17]: df
Out[17]:
  id  x
0  a  0
1  a  1
2  a  2
3  b  3
4  b  4
5  b  5

In [18]: df.groupby('id').x.cumsum()
Out[18]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x, dtype: int64

pandas rolling functions per group

More generally, any rolling function can be applied to each group as follows (using the new .rolling method as commented by @kekert). Note that the return type is a multi-indexed series, which is different from previous (deprecated) pd.rolling_* methods.

In [10]: df.groupby('id')['x'].rolling(2, min_periods=1).sum()
Out[10]:
id
a   0   0.00
    1   1.00
    2   3.00
b   3   3.00
    4   7.00
    5   9.00
Name: x, dtype: float64

To apply the per-group rolling function and receive result in original dataframe order, transform should be used instead:

In [16]: df.groupby('id')['x'].transform(lambda s: s.rolling(2, min_periods=1).sum())
Out[16]:
0    0
1    1
2    3
3    3
4    7
5    9
Name: x, dtype: int64

deprecated approach

For reference, here's how the now deprecated pandas.rolling_mean behaved:

In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]: 
0    0.0
1    0.5
2    1.5
3    3.0
4    3.5
5    4.5

Here is another way that generalizes well and uses pandas' expanding method.

It is very efficient and also works perfectly for rolling window calculations with fixed windows, such as for time series.

# Import pandas library
import pandas as pd

# Prepare columns
x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']

# Create dataframe from columns above
df = pd.DataFrame({'id':id, 'x':x})

# Calculate rolling sum with infinite window size (i.e. all rows in group) using "expanding"
df['rolling_sum'] = df.groupby('id')['x'].transform(lambda x: x.expanding().sum())

# Output as desired by original poster
print(df)
  id  x  rolling_sum
0  a  0            0
1  a  1            1
2  a  2            3
3  b  3            3
4  b  4            7
5  b  5           12

I'm not sure of the mechanics, but this works. Note, the returned value is just an ndarray. I think you could apply any cumulative or "rolling" function in this manner and it should have the same result.

I have tested it with cumprod, cummax and cummin and they all returned an ndarray. I think pandas is smart enough to know that these functions return a series and so the function is applied as a transformation rather than an aggregation.

In [35]: df.groupby('id')['x'].cumsum()
Out[35]:
0     0
1     1
2     3
3     3
4     7
5    12

Edit: I found it curious that this syntax does return a Series:

In [54]: df.groupby('id')['x'].transform('cumsum')
Out[54]:
0     0
1     1
2     3
3     3
4     7
5    12
Name: x

Related questions
                            
                                What is the advantage of the new print function in Python 3.x over the Python 2 print statement?
                            
                                Pandas add one day to column
                            
                                Eliminating warnings from scikit-learn [duplicate]
                            
                                Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC
                            
                                SQLAlchemy + SQL Injection
                            
                                Python: best practice and securest way to connect to MySQL and execute queries
                            
                                Extending python - to swig, not to swig or Cython
                            
                                creating over 20 unique legend colors using matplotlib
                            
                                Filtering multiple items in a multi-index Python Panda dataframe
                            
                                Is there any difference between "string" and 'string' in Python? [duplicate]
                            
                                Possible to share in-memory data between 2 separate processes?
                            
                                Shortcut key for commenting out lines of Python code in Spyder
                            
                                Installing OpenCV fails because it cannot find "skbuild"
                            
                                How can I tell where my python script is hanging?
                            
                                The problem with installing PIL using virtualenv or buildout
                            
                                How to get image width and height in OpenCV? [duplicate]
                            
                                JSON output sorting in Python
                            
                                Installing MySQL-python
                            
                                How can I change the font size using seaborn FacetGrid?
                            
                                Builder pattern equivalent in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - rolling functions for GroupBy object

Tags:

python

pandas

pandas-groupby

rolling-computation

rolling-sum

People also ask

cumulative sum

pandas rolling functions per group

deprecated approach

Recent Activity

Donate For Us