I have a sample dataframe <pre class="prettyprint"><code>Account Date Amount 10 2020-06-01 100 10 2020-06-11 500 10 2020-06-21 600 10 2020-06-25 900 10 2020-07-11 1000 10 2020-07-15 600 11 2020-06-01 100 11 2020-06-11 200 11 2020-06-21 500 11 2020-06-25 1500 11 2020-07-11 2500 11 2020-07-15 6700 </code></pre> I want to get the number of rows in each 30 day interval for each account ie <pre class="prettyprint"><code>Account Date Amount 10 2020-06-01 1 10 2020-06-11 2 10 2020-06-21 3 10 2020-06-25 4 10 2020-07-11 4 10 2020-07-15 4 11 2020-06-01 1 11 2020-06-11 2 11 2020-06-21 3 11 2020-06-25 4 11 2020-07-11 4 11 2020-07-15 4 </code></pre> I have tried Grouper and resampling but those give me the counts per each 30 days and not the rolling counts. Thanks in advance!

<pre class="prettyprint"><code>def get_rolling_amount(grp, freq): return grp.rolling(freq, on="Date", closed="both").count() df["Date"] = pd.to_datetime(df["Date"]) df["Amount"] = df.groupby("Account").apply(get_rolling_amount, "30D").values print(df) </code></pre> Prints: <pre class="prettyprint lang-none prettyprint-override"><code> Account Date Amount 0 10 2020-06-01 1 1 10 2020-06-11 2 2 10 2020-06-21 3 3 10 2020-06-25 4 4 10 2020-07-11 4 5 10 2020-07-15 4 6 11 2020-06-01 1 7 11 2020-06-11 2 8 11 2020-06-21 3 9 11 2020-06-25 4 10 11 2020-07-11 4 11 11 2020-07-15 4 </code></pre>

Number of rows in a rolling window of 30 days

Tags:

python-3.x

pandas

pandas-groupby

rolling-computation

I have a sample dataframe

Click to copy

Account     Date         Amount 
10          2020-06-01   100
10          2020-06-11   500
10          2020-06-21   600
10          2020-06-25   900
10          2020-07-11   1000
10          2020-07-15   600
11          2020-06-01   100
11          2020-06-11   200
11          2020-06-21   500
11          2020-06-25   1500
11          2020-07-11   2500
11          2020-07-15   6700

I want to get the number of rows in each 30 day interval for each account ie

Click to copy

Account     Date         Amount 
10          2020-06-01   1
10          2020-06-11   2
10          2020-06-21   3
10          2020-06-25   4
10          2020-07-11   4
10          2020-07-15   4
11          2020-06-01   1
11          2020-06-11   2
11          2020-06-21   3
11          2020-06-25   4
11          2020-07-11   4
11          2020-07-15   4

I have tried Grouper and resampling but those give me the counts per each 30 days and not the rolling counts.
Thanks in advance!

790

asked Apr 12 '21 22:04

LeCoconutWhisperer

2 Answers

Click to copy

def get_rolling_amount(grp, freq):
    return grp.rolling(freq, on="Date", closed="both").count()


df["Date"] = pd.to_datetime(df["Date"])
df["Amount"] = df.groupby("Account").apply(get_rolling_amount, "30D").values
print(df)

Prints:

Click to copy

    Account       Date Amount
0        10 2020-06-01      1
1        10 2020-06-11      2
2        10 2020-06-21      3
3        10 2020-06-25      4
4        10 2020-07-11      4
5        10 2020-07-15      4
6        11 2020-06-01      1
7        11 2020-06-11      2
8        11 2020-06-21      3
9        11 2020-06-25      4
10       11 2020-07-11      4
11       11 2020-07-15      4

154

answered Sep 29 '22 16:09

Andrej Kesely

You can use broadcasting within group to check how many rows fall within X days.

Click to copy

import pandas as pd

def within_days(s, days):
    arr = ((s.to_numpy() >= s.to_numpy()[:, None]) 
           & (s.to_numpy() <= (s + pd.offsets.DateOffset(days=days)).to_numpy()[:, None])).sum(axis=0)
    return pd.Series(arr, index=s.index)

df['Amount'] = df.groupby('Account')['Date'].apply(within_days, days=30)

Click to copy

    Account       Date  Amount
0        10 2020-06-01       1
1        10 2020-06-11       2
2        10 2020-06-21       3
3        10 2020-06-25       4
4        10 2020-07-11       4
5        10 2020-07-15       4
6        11 2020-06-01       1
7        11 2020-06-11       2
8        11 2020-06-21       3
9        11 2020-06-25       4
10       11 2020-07-11       4
11       11 2020-07-15       4

answered Sep 29 '22 15:09

ALollz

Related questions
                            
                                Why is key in dict() faster than dict.get(key) in Python3?
                            
                                How to resample data inside multiindex dataframe
                            
                                Why python don't print after a time.sleep()?
                            
                                Why does Python's sorted() method not reverse orders of keys with the same value in a dictionary?
                            
                                BigQuery update how to get number of updated rows
                            
                                combine two strings in python
                            
                                Is there a way to persist decorators during inheritance?
                            
                                TypeError: module() takes at most 2 arguments (3 given) code taken from pluralsight course [duplicate]
                            
                                How to print docstring for class attribute/element?
                            
                                AWS Lambda, Python, Numpy and others as Layers
                            
                                Installing Anaconda on Raspberry Pi 4 with Ubuntu 20.04
                            
                                Finding highest n values of every column in dataframe [duplicate]
                            
                                Can you make Python3 give an error when comparing strings to bytes
                            
                                Conditionally drop Pandas Dataframe row
                            
                                list the files of a directory and subdirectory recursively in Databricks(DBFS)
                            
                                How to fix 'Could not find a version that satisfies the requirement' for install_requires list when pip installing in custom package?
                            
                                finplot as a widget in layout
                            
                                AttributeError: 'str' object has no attribute 'dim' in pytorch
                            
                                psycopg2.errors.InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block
                            
                                Pandas: Create dict where one column is key and list of remaining columns are values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Number of rows in a rolling window of 30 days

Tags:

python-3.x

pandas

pandas-groupby

rolling-computation

LeCoconutWhisperer

People also ask

2 Answers

Andrej Kesely

ALollz

Recent Activity

Donate For Us