Here's a code snippet from Pandas Issue #13966
dates = pd.date_range(start='2016-01-01 09:30:00', periods=20, freq='s')
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.concatenate((dates, dates)),
                   'C': np.arange(40)})
Fails:
df.groupby('A').rolling('4s', on='B').C.mean()
ValueError: B must be monotonic
Per the issue linked above, this seems to be a bug. Does anyone have a good workaround?
Set B as the index first so as to use Groupby.resample method on it.
df.set_index('B', inplace=True)
Groupby A and resample based on seconds frequency. As resample cannot be directly used with rolling, use ffill(forward fillna with NaN limit as 0).
Now use rolling function by specifying the window size as 4 (because of freq=4s) interval and take it's mean along C column as shown:
for _, grp in df.groupby('A'):
    print (grp.resample('s').ffill(limit=0).rolling(4)['C'].mean().head(10)) #Remove head() 
Resulting output obtained:
B
2016-01-01 09:30:00    NaN
2016-01-01 09:30:01    NaN
2016-01-01 09:30:02    NaN
2016-01-01 09:30:03    1.5
2016-01-01 09:30:04    2.5
2016-01-01 09:30:05    3.5
2016-01-01 09:30:06    4.5
2016-01-01 09:30:07    5.5
2016-01-01 09:30:08    6.5
2016-01-01 09:30:09    7.5
Freq: S, Name: C, dtype: float64
B
2016-01-01 09:30:00     NaN
2016-01-01 09:30:01     NaN
2016-01-01 09:30:02     NaN
2016-01-01 09:30:03    21.5
2016-01-01 09:30:04    22.5
2016-01-01 09:30:05    23.5
2016-01-01 09:30:06    24.5
2016-01-01 09:30:07    25.5
2016-01-01 09:30:08    26.5
2016-01-01 09:30:09    27.5
Freq: S, Name: C, dtype: float64
B
2016-01-01 09:30:12     NaN
2016-01-01 09:30:13     NaN
2016-01-01 09:30:14     NaN
2016-01-01 09:30:15    33.5
2016-01-01 09:30:16    34.5
2016-01-01 09:30:17    35.5
2016-01-01 09:30:18    36.5
2016-01-01 09:30:19    37.5
Freq: S, Name: C, dtype: float64
TL;DR
Use groupby.apply as a workaround instead after setting the index appropriately:
# tested in version - 0.19.1
df.groupby('A').apply(lambda grp: grp.resample('s').ffill(limit=0).rolling(4)['C'].mean())
(Or)
# Tested in OP's version - 0.19.0
df.groupby('A').apply(lambda grp: grp.resample('s').ffill().rolling(4)['C'].mean())
Both work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With