Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of rows in a rolling window of 30 days

I have a sample dataframe

Account     Date         Amount 
10          2020-06-01   100
10          2020-06-11   500
10          2020-06-21   600
10          2020-06-25   900
10          2020-07-11   1000
10          2020-07-15   600
11          2020-06-01   100
11          2020-06-11   200
11          2020-06-21   500
11          2020-06-25   1500
11          2020-07-11   2500
11          2020-07-15   6700

I want to get the number of rows in each 30 day interval for each account ie

Account     Date         Amount 
10          2020-06-01   1
10          2020-06-11   2
10          2020-06-21   3
10          2020-06-25   4
10          2020-07-11   4
10          2020-07-15   4
11          2020-06-01   1
11          2020-06-11   2
11          2020-06-21   3
11          2020-06-25   4
11          2020-07-11   4
11          2020-07-15   4

I have tried Grouper and resampling but those give me the counts per each 30 days and not the rolling counts.
Thanks in advance!

like image 790
LeCoconutWhisperer Avatar asked Apr 12 '21 22:04

LeCoconutWhisperer


People also ask

How are rolling windows calculated?

Each row gets a “Rolling Close Average” equal to its “Close*” value plus the previous row's “Close*” divided by 2 (the window). In essence, it's Moving Avg = ([t] + [t-1]) / 2 . In practice, this means the first calculated value (62.44 + 62.58) / 2 = 62.51 , which is the “Rolling Close Average” value for February 4.

What is window in rolling?

A Rolling window is expressed relative to the delivery date and automatically shifts forward with the passage of time. For example, a customer with a 5-year Rolling window who gets a delivery on May 4, 2015 would receive data covering the period from May 4, 2015 to May 4, 2020.

What is a rolling total for a month?

A rolling total for a month is the total for that month plus the previous months within the time window, or NULL if you don’t have the values for all the previous months within the time window .

When do you want to get all rows in a table?

You want all of the rows for when a customer has ordered 10 or more of anything over a 30 day date range starting with that day. Suppose that you just looped over each ID_NO in reverse TD order and kept track of the running sum of AMT_PUR.

How do I create a rolling chart in Excel?

The Right Way to Create an Excel Rolling Chart. 1 On the Design tab, click Select Data. 2 In the Select Data Source dialog box, select the first data series and click. 2.1 In the Series values: text box in the Edit Series dialog box, replace the default table range with the dynamic data named range. Do not change the ... 2.2 Click OK.

How to remove rows too far away from the current row?

However, if there was a way to remove rows too far away from the current row then you could just use the running sum and you could do your calculation with a single pass over the data. The idea behind this approach is to UNION ALL together two copies of the data.


2 Answers

def get_rolling_amount(grp, freq):
    return grp.rolling(freq, on="Date", closed="both").count()


df["Date"] = pd.to_datetime(df["Date"])
df["Amount"] = df.groupby("Account").apply(get_rolling_amount, "30D").values
print(df)

Prints:

    Account       Date Amount
0        10 2020-06-01      1
1        10 2020-06-11      2
2        10 2020-06-21      3
3        10 2020-06-25      4
4        10 2020-07-11      4
5        10 2020-07-15      4
6        11 2020-06-01      1
7        11 2020-06-11      2
8        11 2020-06-21      3
9        11 2020-06-25      4
10       11 2020-07-11      4
11       11 2020-07-15      4
like image 154
Andrej Kesely Avatar answered Sep 29 '22 16:09

Andrej Kesely


You can use broadcasting within group to check how many rows fall within X days.

import pandas as pd

def within_days(s, days):
    arr = ((s.to_numpy() >= s.to_numpy()[:, None]) 
           & (s.to_numpy() <= (s + pd.offsets.DateOffset(days=days)).to_numpy()[:, None])).sum(axis=0)
    return pd.Series(arr, index=s.index)

df['Amount'] = df.groupby('Account')['Date'].apply(within_days, days=30)

    Account       Date  Amount
0        10 2020-06-01       1
1        10 2020-06-11       2
2        10 2020-06-21       3
3        10 2020-06-25       4
4        10 2020-07-11       4
5        10 2020-07-15       4
6        11 2020-06-01       1
7        11 2020-06-11       2
8        11 2020-06-21       3
9        11 2020-06-25       4
10       11 2020-07-11       4
11       11 2020-07-15       4
like image 27
ALollz Avatar answered Sep 29 '22 15:09

ALollz