I have a rolling sum calculated on a grouped data frame but its adding up the wrong way, it is a sum of the future, when I need a sum of the past.
What am I doing wrong here?
I import the data and sort by Dimension and Date (I have tried removing the date sort already)
df = pd.read_csv('Input.csv', parse_dates=True)
df.sort_values(['Dimension','Date'])
print(df)
I then create a new column which is a multi index grouped by rolling window
new_column = df.groupby('Dimension').Value1.apply(lambda x: 
x.rolling(window=3).sum())
I then reset the index to be the same as the original
df['Sum_Value1'] = new_column.reset_index(level=0, drop=True)
print(df)
I have also tried reversing the index before the calculation, but that also failed.
Input
Dimension,Date,Value1,Value2
1,4/30/2002,10,20
1,1/31/2002,10,20
1,10/31/2001,10,20
1,7/31/2001,10,20
1,4/30/2001,10,20
1,1/31/2001,10,20
1,10/31/2000,10,20
2,4/30/2002,10,20
2,1/31/2002,10,20
2,10/31/2001,10,20
2,7/31/2001,10,20
2,4/30/2001,10,20
2,1/31/2001,10,20
2,10/31/2000,10,20
3,4/30/2002,10,20
3,1/31/2002,10,20
3,10/31/2001,10,20
3,7/31/2001,10,20
3,1/31/2001,10,20
3,10/31/2000,10,20
Output:
    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20         NaN
1           1   1/31/2002      10      20         NaN
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20        30.0
6           1  10/31/2000      10      20        30.0
7           2   4/30/2002      10      20         NaN
8           2   1/31/2002      10      20         NaN
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20        30.0
13          2  10/31/2000      10      20        30.0
Goal Output:
    Dimension        Date  Value1  Value2  Sum_Value1
0           1   4/30/2002      10      20        30.0
1           1   1/31/2002      10      20        30.0
2           1  10/31/2001      10      20        30.0
3           1   7/31/2001      10      20        30.0
4           1   4/30/2001      10      20        30.0
5           1   1/31/2001      10      20         NaN
6           1  10/31/2000      10      20         NaN
7           2   4/30/2002      10      20        30.0
8           2   1/31/2002      10      20        30.0
9           2  10/31/2001      10      20        30.0
10          2   7/31/2001      10      20        30.0
11          2   4/30/2001      10      20        30.0
12          2   1/31/2001      10      20         Nan
13          2  10/31/2000      10      20         NaN
                You need a backward sum, therefore reverse your series before sum rolling it:
lambda x: x[::-1].rolling(window=3).sum()
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With