I am trying to set center=True in pandas rolling function, for a time-series:
import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling('7D', min_periods=1, center=True, closed='left')
But output is:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-6-6b30c16a2d12> in <module>()
1 import pandas as pd
2 series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
----> 3 series.rolling('7D', min_periods=1, center=True, closed='left')
~\Anaconda3\lib\site-packages\pandas\core\generic.py in rolling(self, window, min_periods, freq, center, win_type, on, axis, closed)
6193 min_periods=min_periods, freq=freq,
6194 center=center, win_type=win_type,
-> 6195 on=on, axis=axis, closed=closed)
6196
6197 cls.rolling = rolling
~\Anaconda3\lib\site-packages\pandas\core\window.py in rolling(obj, win_type, **kwds)
2050 return Window(obj, win_type=win_type, **kwds)
2051
-> 2052 return Rolling(obj, **kwds)
2053
2054
~\Anaconda3\lib\site-packages\pandas\core\window.py in __init__(self, obj, window, min_periods, freq, center, win_type, axis, on, closed, **kwargs)
84 self.win_freq = None
85 self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 86 self.validate()
87
88 @property
~\Anaconda3\lib\site-packages\pandas\core\window.py in validate(self)
1090 # we don't allow center
1091 if self.center:
-> 1092 raise NotImplementedError("center is not implemented "
1093 "for datetimelike and offset "
1094 "based windows")
NotImplementedError: center is not implemented for datetimelike and offset based windows
Expected output is the one generated by:
import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling(7, min_periods=1, center=True).sum().head(10)
2014-01-01 4.0
2014-01-02 5.0
2014-01-03 6.0
2014-01-04 7.0
2014-01-05 7.0
2014-01-06 7.0
2014-01-07 7.0
2014-01-08 7.0
2014-01-09 7.0
2014-01-10 7.0
Freq: D, dtype: float64
But using datetime like offsets, since it simplifies part of my other code (not posted here).
Is there any alternative solution?
Thanks
Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.
The rolling() function is used to provide rolling window calculations. Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
Rolling sum using pandas rolling(). sum() Here, n is the size of the moving window you want to use, that is, the number of observations you want to use to compute the rolling statistic, in our case, the sum.
pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python libraries like scikits.
Try the following (tested with pandas==0.23.3
):
series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')
This will center your rolling sum in the 7-day window (by shifting -3.5 days), and will allow you to use a 'datetimelike' string for defining the window size. Note that shift()
only takes an integer, thus defining with hours.
This will produce your desired output:
series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')['2014-01-01':].head(10)
2014-01-01 12:00:00 4.0
2014-01-02 12:00:00 5.0
2014-01-03 12:00:00 6.0
2014-01-04 12:00:00 7.0
2014-01-05 12:00:00 7.0
2014-01-06 12:00:00 7.0
2014-01-07 12:00:00 7.0
2014-01-08 12:00:00 7.0
2014-01-09 12:00:00 7.0
2014-01-10 12:00:00 7.0
Freq: D, dtype: float64
Note that the rolling sum is assigned to the center of the 7-day windows (using midnight to midnight timestamps), so the centered timestamp includes '12:00:00'.
Another option (as you show at the end of your question) is to resample the data to make sure it has even Datetime frequency, then use an integer for window size (window = 7
) and center=True
. However, you state that other parts of your code benefit from defining window
with a 'datetimelike' string, so perhaps this option is not ideal.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With