Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: rolling mean by time interval

I've got a bunch of polling data; I want to compute a Pandas rolling mean to get an estimate for each day based on a three-day window. According to this question, the rolling_* functions compute the window based on a specified number of values, and not a specific datetime range.

How do I implement this functionality?

Sample input data:

polls_subset.tail(20) Out[185]:              favorable  unfavorable  other  enddate                                   2012-10-25       0.48         0.49   0.03 2012-10-25       0.51         0.48   0.02 2012-10-27       0.51         0.47   0.02 2012-10-26       0.56         0.40   0.04 2012-10-28       0.48         0.49   0.04 2012-10-28       0.46         0.46   0.09 2012-10-28       0.48         0.49   0.03 2012-10-28       0.49         0.48   0.03 2012-10-30       0.53         0.45   0.02 2012-11-01       0.49         0.49   0.03 2012-11-01       0.47         0.47   0.05 2012-11-01       0.51         0.45   0.04 2012-11-03       0.49         0.45   0.06 2012-11-04       0.53         0.39   0.00 2012-11-04       0.47         0.44   0.08 2012-11-04       0.49         0.48   0.03 2012-11-04       0.52         0.46   0.01 2012-11-04       0.50         0.47   0.03 2012-11-05       0.51         0.46   0.02 2012-11-07       0.51         0.41   0.00 

Output would have only one row for each date.

like image 936
Anov Avatar asked Apr 02 '13 18:04

Anov


People also ask

How do you calculate rolling average in pandas?

In Python, we can calculate the moving average using . rolling() method. This method provides rolling windows over the data, and we can use the mean function over these windows to calculate moving averages. The size of the window is passed as a parameter in the function .

What is rolling mean in pandas?

A rolling mean is simply the mean of a certain number of previous periods in a time series. To calculate the rolling mean for one or more columns in a pandas DataFrame, we can use the following syntax: df['column_name']. rolling(rolling_window). mean()

How do you get the pandas rolling sum?

Rolling sum using pandas rolling(). sum() If you apply the above function on a pandas dataframe, it will result in a rolling sum for all the numerical columns in the dataframe.

What is bfill in Python?

Definition and Usage. The bfill() method replaces the NULL values with the values from the next row (or next column, if the axis parameter is set to 'columns' ).


2 Answers

In the meantime, a time-window capability was added. See this link.

In [1]: df = DataFrame({'B': range(5)})  In [2]: df.index = [Timestamp('20130101 09:00:00'),    ...:             Timestamp('20130101 09:00:02'),    ...:             Timestamp('20130101 09:00:03'),    ...:             Timestamp('20130101 09:00:05'),    ...:             Timestamp('20130101 09:00:06')]  In [3]: df Out[3]:                       B 2013-01-01 09:00:00  0 2013-01-01 09:00:02  1 2013-01-01 09:00:03  2 2013-01-01 09:00:05  3 2013-01-01 09:00:06  4  In [4]: df.rolling(2, min_periods=1).sum() Out[4]:                         B 2013-01-01 09:00:00  0.0 2013-01-01 09:00:02  1.0 2013-01-01 09:00:03  3.0 2013-01-01 09:00:05  5.0 2013-01-01 09:00:06  7.0  In [5]: df.rolling('2s', min_periods=1).sum() Out[5]:                         B 2013-01-01 09:00:00  0.0 2013-01-01 09:00:02  1.0 2013-01-01 09:00:03  3.0 2013-01-01 09:00:05  3.0 2013-01-01 09:00:06  7.0 
like image 68
Martin Avatar answered Oct 15 '22 17:10

Martin


What about something like this:

First resample the data frame into 1D intervals. This takes the mean of the values for all duplicate days. Use the fill_method option to fill in missing date values. Next, pass the resampled frame into pd.rolling_mean with a window of 3 and min_periods=1 :

pd.rolling_mean(df.resample("1D", fill_method="ffill"), window=3, min_periods=1)              favorable  unfavorable     other enddate 2012-10-25   0.495000     0.485000  0.025000 2012-10-26   0.527500     0.442500  0.032500 2012-10-27   0.521667     0.451667  0.028333 2012-10-28   0.515833     0.450000  0.035833 2012-10-29   0.488333     0.476667  0.038333 2012-10-30   0.495000     0.470000  0.038333 2012-10-31   0.512500     0.460000  0.029167 2012-11-01   0.516667     0.456667  0.026667 2012-11-02   0.503333     0.463333  0.033333 2012-11-03   0.490000     0.463333  0.046667 2012-11-04   0.494000     0.456000  0.043333 2012-11-05   0.500667     0.452667  0.036667 2012-11-06   0.507333     0.456000  0.023333 2012-11-07   0.510000     0.443333  0.013333 

UPDATE: As Ben points out in the comments, with pandas 0.18.0 the syntax has changed. With the new syntax this would be:

df.resample("1d").sum().fillna(0).rolling(window=3, min_periods=1).mean() 
like image 31
Zelazny7 Avatar answered Oct 15 '22 15:10

Zelazny7