Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get a continuous rolling mean in pandas?

Looking to get a continuous rolling mean of a dataframe.

df looks like this

index price

0      4
1      6
2      10
3      12

looking to get a continuous rolling of price

the goal is to have it look this a moving mean of all the prices.

index price  mean

0      4      4
1      6      5
2      10     6.67
3      12     8 

thank you in advance!

like image 642
TroyMcClure8998 Avatar asked Jan 06 '20 04:01

TroyMcClure8998


2 Answers

you can use expanding:

df['mean'] = df.price.expanding().mean()

df
index   price   mean
0       4       4.000000
1       6       5.000000
2       10      6.666667
3       12      8.000000
like image 51
Allen Avatar answered Oct 20 '22 09:10

Allen


Welcome to SO: Hopefully people will soon remember you from prior SO posts, such as this one.

From your example, it seems that @Allen has given you code that produces the answer in your table. That said, this isn't exactly the same as a "rolling" mean. The expanding() function Allen uses is taking the mean of the first row divided by n (which is 1), then adding rows 1 and 2 and dividing by n (which is now 2), and so on, so that the last row is (4+6+10+12)/4 = 8.

This last number could be the answer if the window you want for the rolling mean is 4, since that would indicate that you want a mean of 4 observations. However, if you keep moving forward with a window size 4, and start including rows 5, 6, 7... then the answer from expanding() might differ from what you want. In effect, expanding() is recording the mean of the entire series (price in this case) as though it were receiving a new piece of data at each row. "Rolling", on the other hand, gives you a result from an aggregation of some window size.

Here's another option for doing rolling calculations: the rolling() method in a pandas.dataframe.

In your case, you would do:

df['rolling_mean'] = df.price.rolling(4).mean()

df
index   price   rolling_mean
0       4       nan
1       6       nan
2       10      nan
3       12      8.000000

Those nans are a result of the windowing: until there are enough rows to calculate the mean, the result is nan. You could set a smaller window:

df['rolling_mean'] = df.price.rolling(2).mean()

df
index   price   rolling_mean
0       4       nan
1       6       5.000000
2       10      8.000000
3       12      11.00000

This shows the reduction in the nan entries as well as the rolling function: it 's only averaging within the size-two window you provided. That results in a different df['rolling_mean'] value than when using df.price.expanding().

Note: you can get rid of the nan by using .rolling(2, min_periods = 1), which tells the function the minimum number of defined values within a window that have to be present to calculate a result.

like image 43
Savage Henry Avatar answered Oct 20 '22 09:10

Savage Henry