Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Change rolling window size as it rolls

I have a pandas data frame like this;

>df

leg    speed
  1       10
  1       11
  1       12
  1       13
  1       12
  1       15
  1       19
  1       12
  2       10
  2       10
  2       12
  2       15
  2       19
  2       11
  :        :

I want to make a new column roll_speed where it takes a rolling average speed of the last 5 positions. But I wanna put more detailed condition in it.

  1. Groupby leg(it doesn't take into account the speed of the rows in different leg.
  2. I want the rolling window to be changed from 1 to 5 maximum according to the available rows. For example in leg == 1, in the first row there is only one row to calculate, so the rolling speed should be 10/1 = 10. For the second row, there are only two rows available for calculation, the rolling speed should be (10+11)/2 = 10.5.

    leg    speed   roll_speed
      1       10           10    # 10/1
      1       11           10.5  # (10+11)/2
      1       12           11    # (10+11+12)/3
      1       13           11.5  # (10+11+12+13)/4
      1       12           11.6  # (10+11+12+13+12)/5
      1       15           12.6  # (11+12+13+12+15)/5
      1       19           14.2  # (12+13+12+15+19)/5
      1       12           14.2  # (13+12+15+19+12)/5
      2       10           10    # 10/1
      2       10           10    # (10+10)/2
      2       12           10.7  # (10+10+12)/3
      2       15           11.8  # (10+10+12+15)/4
      2       19           13.2  # (10+10+12+15+19)/5
      2       11           13.4  # (10+12+15+19+11)/5
      :        :           
    

My attempt:

df['roll_speed'] = df.speed.rolling(5).mean()

But it just returns NA for rows where less than five rows are available for calculation. How should I solve this problem? Thank you for any help!

like image 678
Makoto Miyazaki Avatar asked Aug 13 '18 15:08

Makoto Miyazaki


People also ask

What is a rolling window calculation?

Window Rolling Mean (Moving Average) The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

What is window size in time series?

Anomaly functions apply a sliding window to a signal of time series data to capture patterns in the signal. The window size determines the size of the sliding window.

What is rolling window backtesting?

In the rolling window backtesting methodology, researchers use a rolling window (or walk-forward) framework, fit/calibrate factors or trade signals based on the rolling window, rebalance the portfolio periodically, and then track the performance over time.

What does DF Rolling do?

rolling() function provides the feature of rolling window calculations.


Video Answer


2 Answers

Set the parameter min_periods to 1

df['roll_speed'] = df.groupby('leg').speed.rolling(5, min_periods = 1).mean()\
.round(1).reset_index(drop = True)

    leg speed   roll_speed
0   1   10  10.0
1   1   11  10.5
2   1   12  11.0
3   1   13  11.5
4   1   12  11.6
5   1   15  12.6
6   1   19  14.2
7   1   12  14.2
8   2   10  10.0
9   2   10  10.0
10  2   12  10.7
11  2   15  11.8
12  2   19  13.2
13  2   11  13.4
like image 170
Vaishali Avatar answered Oct 18 '22 18:10

Vaishali


Using rolling(5) will get you your results for all but the first 4 occurences of each group. We can fill the remaining values with the expanding mean:

(df.groupby('leg').speed.rolling(5)
    .mean().fillna(df.groupby('leg').speed.expanding().mean())
).reset_index(drop=True)

0     10.000000
1     10.500000
2     11.000000
3     11.500000
4     11.600000
5     12.600000
6     14.200000
7     14.200000
8     10.000000
9     10.000000
10    10.666667
11    11.750000
12    13.200000
13    13.400000
Name: speed, dtype: float64
like image 2
user3483203 Avatar answered Oct 18 '22 20:10

user3483203