Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Pandas rolling_* functions on a forward-looking basis

Tags:

Suppose I have a time series:

In[138] rng = pd.date_range('1/10/2011', periods=10, freq='D') In[139] ts = pd.Series(randn(len(rng)), index=rng) In[140] Out[140]: 2011-01-10    0 2011-01-11    1 2011-01-12    2 2011-01-13    3 2011-01-14    4 2011-01-15    5 2011-01-16    6 2011-01-17    7 2011-01-18    8 2011-01-19    9 Freq: D, dtype: int64 

If I use one of the rolling_* functions, for instance rolling_sum, I can get the behavior I want for backward looking rolling calculations:

In [157]: pd.rolling_sum(ts, window=3, min_periods=0) Out[157]:  2011-01-10     0 2011-01-11     1 2011-01-12     3 2011-01-13     6 2011-01-14     9 2011-01-15    12 2011-01-16    15 2011-01-17    18 2011-01-18    21 2011-01-19    24 Freq: D, dtype: float64 

But what if I want to do a forward-looking sum? I've tried something like this:

In [161]: pd.rolling_sum(ts.shift(-2, freq='D'), window=3, min_periods=0) Out[161]:  2011-01-08     0 2011-01-09     1 2011-01-10     3 2011-01-11     6 2011-01-12     9 2011-01-13    12 2011-01-14    15 2011-01-15    18 2011-01-16    21 2011-01-17    24 Freq: D, dtype: float64 

But that's not exactly the behavior I want. What I am looking for as an output is:

2011-01-10    3 2011-01-11    6 2011-01-12    9 2011-01-13    12 2011-01-14    15 2011-01-15    18 2011-01-16    21 2011-01-17    24 2011-01-18    17 2011-01-19    9 

ie - I want the sum of the "current" day plus the next two days. My current solution is not sufficient because I care about what happens at the edges. I know I could solve this manually by setting up two additional columns that are shifted by 1 and 2 days respectively and then summing the three columns, but there's got to be a more elegant solution.

like image 566
user2543645 Avatar asked Apr 02 '14 18:04

user2543645


2 Answers

Why not just do it on the reversed Series (and reverse the answer):

In [11]: pd.rolling_sum(ts[::-1], window=3, min_periods=0)[::-1] Out[11]: 2011-01-10     3 2011-01-11     6 2011-01-12     9 2011-01-13    12 2011-01-14    15 2011-01-15    18 2011-01-16    21 2011-01-17    24 2011-01-18    17 2011-01-19     9 Freq: D, dtype: float64 
like image 137
Andy Hayden Avatar answered Sep 17 '22 17:09

Andy Hayden


I struggled with this then found an easy way using shift.

If you want a rolling sum for the next 10 periods, try:

df['NewCol'] = df['OtherCol'].shift(-10).rolling(10, min_periods = 0).sum() 

We use shift so that "OtherCol" shows up 10 rows ahead of where it normally would be, then we do a rolling sum over the previous 10 rows. Because we shifted, the previous 10 rows are actually the future 10 rows of the unshifted column. :)

like image 41
MitchellRosenthal256 Avatar answered Sep 19 '22 17:09

MitchellRosenthal256