Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use center in pandas rolling when using a time-series

I am trying to set center=True in pandas rolling function, for a time-series:

import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling('7D', min_periods=1, center=True, closed='left')

But output is:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-6-6b30c16a2d12> in <module>()
      1 import pandas as pd
      2 series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
----> 3 series.rolling('7D', min_periods=1, center=True, closed='left')

~\Anaconda3\lib\site-packages\pandas\core\generic.py in rolling(self, window, min_periods, freq, center, win_type, on, axis, closed)
   6193                                    min_periods=min_periods, freq=freq,
   6194                                    center=center, win_type=win_type,
-> 6195                                    on=on, axis=axis, closed=closed)
   6196 
   6197         cls.rolling = rolling

~\Anaconda3\lib\site-packages\pandas\core\window.py in rolling(obj, win_type, **kwds)
   2050         return Window(obj, win_type=win_type, **kwds)
   2051 
-> 2052     return Rolling(obj, **kwds)
   2053 
   2054 

~\Anaconda3\lib\site-packages\pandas\core\window.py in __init__(self, obj, window, min_periods, freq, center, win_type, axis, on, closed, **kwargs)
     84         self.win_freq = None
     85         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 86         self.validate()
     87 
     88     @property

~\Anaconda3\lib\site-packages\pandas\core\window.py in validate(self)
   1090             # we don't allow center
   1091             if self.center:
-> 1092                 raise NotImplementedError("center is not implemented "
   1093                                           "for datetimelike and offset "
   1094                                           "based windows")

NotImplementedError: center is not implemented for datetimelike and offset based windows

Expected output is the one generated by:

import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling(7, min_periods=1, center=True).sum().head(10)

2014-01-01    4.0
2014-01-02    5.0
2014-01-03    6.0
2014-01-04    7.0
2014-01-05    7.0
2014-01-06    7.0
2014-01-07    7.0
2014-01-08    7.0
2014-01-09    7.0
2014-01-10    7.0
Freq: D, dtype: float64

But using datetime like offsets, since it simplifies part of my other code (not posted here).

Is there any alternative solution?

Thanks

like image 452
karen Avatar asked Oct 30 '17 12:10

karen


People also ask

How does rolling work in pandas?

Window Rolling Mean (Moving Average)The moving average calculation creates an updated average value for each row based on the window we specify. The calculation is also called a “rolling mean” because it's calculating an average of values within a specified range for each row as you go along the DataFrame.

What does .rolling do in Python?

The rolling() function is used to provide rolling window calculations. Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

How do you get the pandas rolling sum?

Rolling sum using pandas rolling(). sum() Here, n is the size of the moving window you want to use, that is, the number of observations you want to use to compute the rolling statistic, in our case, the sum.

Is time series supported in pandas?

pandas contains extensive capabilities and features for working with time series data for all domains. Using the NumPy datetime64 and timedelta64 dtypes, pandas has consolidated a large number of features from other Python libraries like scikits.


1 Answers

Try the following (tested with pandas==0.23.3):

series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')

This will center your rolling sum in the 7-day window (by shifting -3.5 days), and will allow you to use a 'datetimelike' string for defining the window size. Note that shift() only takes an integer, thus defining with hours.

This will produce your desired output:

series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')['2014-01-01':].head(10)

2014-01-01 12:00:00    4.0
2014-01-02 12:00:00    5.0
2014-01-03 12:00:00    6.0
2014-01-04 12:00:00    7.0
2014-01-05 12:00:00    7.0
2014-01-06 12:00:00    7.0
2014-01-07 12:00:00    7.0
2014-01-08 12:00:00    7.0
2014-01-09 12:00:00    7.0
2014-01-10 12:00:00    7.0
Freq: D, dtype: float64

Note that the rolling sum is assigned to the center of the 7-day windows (using midnight to midnight timestamps), so the centered timestamp includes '12:00:00'.

Another option (as you show at the end of your question) is to resample the data to make sure it has even Datetime frequency, then use an integer for window size (window = 7) and center=True. However, you state that other parts of your code benefit from defining window with a 'datetimelike' string, so perhaps this option is not ideal.

like image 142
PJW Avatar answered Oct 02 '22 10:10

PJW