Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rolling sum based on all previous dates NOT previous rows sorted by date

Tags:

python

pandas

Given the following dataframe:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 |     15 |
| 27/05/2019 |     20 |
| 27/05/2019 |     15 |
| 25/06/2019 |     10 |
| 29/06/2019 |     25 |
| 01/07/2019 |     50 |
+------------+--------+

I need to get the rolling sum of all previous dates as follows:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 | NaN    |
| 27/05/2019 | 15     |
| 27/05/2019 | 15     |
| 15/06/2019 | 35     |
| 29/06/2019 | 10     |
| 01/07/2019 | 35     |
+------------+--------+

Using:

df = pd.DataFrame(
    {
        'Date': {
            0: datetime.datetime(2019, 5, 1),
            1: datetime.datetime(2019, 5, 27),
            2: datetime.datetime(2019, 5, 27),
            3: datetime.datetime(2019, 6, 15),
            4: datetime.datetime(2019, 6, 29),
            5: datetime.datetime(2019, 7, 1),
        },
        'Amount': {0: 15, 1: 20, 2: 15, 3: 10, 4: 25, 5: 50}
    }
)
df.sort_values("Date", inplace=True)
df_roll = df.rolling("28d", on="Date", closed="left").sum()

Gets me:

+------------+--------+
|    Date    | Amount |
+------------+--------+
| 01/05/2019 |    NaN |
| 27/05/2019 |     15 | 
| 27/05/2019 |     35 | <-- Should be 15
| 15/06/2019 |     35 |
| 29/06/2019 |     10 |
| 01/07/2019 |     35 |
+------------+--------+

Which isn't quite correct.

How would I get the sum of all previous dates rather than all previous rows?

like image 206
Jossy Avatar asked Dec 20 '21 01:12

Jossy


2 Answers

You can do

df['new'] = df.Date.map(df.groupby('Date').Amount.sum().rolling("28d", closed="left").sum())
df
        Date  Amount   new
0 2019-05-01      15   NaN
1 2019-05-27      20  15.0
2 2019-05-27      15  15.0
3 2019-06-15      10  35.0
4 2019-06-29      25  10.0
5 2019-07-01      50  35.0
like image 114
BENY Avatar answered Oct 23 '22 04:10

BENY


One way is to aggregate your amounts by date first, then compute the rolling sum, and join this sum to the original list of dates to apply the rolling sum to all dates

# Aggregate (sum) by date
df_agged = (df.groupby('Date')['Amount'].agg(['sum'])
            .reset_index()
            .rename(columns={'sum':'Amount'}))
# Compute rolling sum
df_agged_rolling = df_agged.rolling("28d",on="Date",closed='left').sum()

# Join on original dates to apply rolling sum to duplicate dates
df_with_rolling_agg = df.join(df_agged_rolling.set_index('Date'),on='Date',
                              lsuffix='_orig',rsuffix='_rolling_sum')
df_with_rolling_agg

#         Date  Amount_orig  Amount_rolling_sum
# 0 2019-05-01           15                 NaN
# 1 2019-05-27           20                15.0
# 2 2019-05-27           15                15.0
# 3 2019-06-15           10                35.0
# 4 2019-06-29           25                10.0
# 5 2019-07-01           50                35.0
like image 35
cookesd Avatar answered Oct 23 '22 03:10

cookesd