Time-based .rolling() fails with group by

Question

Here's a code snippet from Pandas Issue #13966

dates = pd.date_range(start='2016-01-01 09:30:00', periods=20, freq='s')
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.concatenate((dates, dates)),
                   'C': np.arange(40)})

Fails:

df.groupby('A').rolling('4s', on='B').C.mean()
ValueError: B must be monotonic

Per the issue linked above, this seems to be a bug. Does anyone have a good workaround?

Nickil Maveli · Accepted Answer

Set B as the index first so as to use Groupby.resample method on it.

df.set_index('B', inplace=True)

Groupby A and resample based on seconds frequency. As resample cannot be directly used with rolling, use ffill(forward fillna with NaN limit as 0). Now use rolling function by specifying the window size as 4 (because of freq=4s) interval and take it's mean along C column as shown:

for _, grp in df.groupby('A'):
    print (grp.resample('s').ffill(limit=0).rolling(4)['C'].mean().head(10)) #Remove head()

Resulting output obtained:

B
2016-01-01 09:30:00    NaN
2016-01-01 09:30:01    NaN
2016-01-01 09:30:02    NaN
2016-01-01 09:30:03    1.5
2016-01-01 09:30:04    2.5
2016-01-01 09:30:05    3.5
2016-01-01 09:30:06    4.5
2016-01-01 09:30:07    5.5
2016-01-01 09:30:08    6.5
2016-01-01 09:30:09    7.5
Freq: S, Name: C, dtype: float64
B
2016-01-01 09:30:00     NaN
2016-01-01 09:30:01     NaN
2016-01-01 09:30:02     NaN
2016-01-01 09:30:03    21.5
2016-01-01 09:30:04    22.5
2016-01-01 09:30:05    23.5
2016-01-01 09:30:06    24.5
2016-01-01 09:30:07    25.5
2016-01-01 09:30:08    26.5
2016-01-01 09:30:09    27.5
Freq: S, Name: C, dtype: float64
B
2016-01-01 09:30:12     NaN
2016-01-01 09:30:13     NaN
2016-01-01 09:30:14     NaN
2016-01-01 09:30:15    33.5
2016-01-01 09:30:16    34.5
2016-01-01 09:30:17    35.5
2016-01-01 09:30:18    36.5
2016-01-01 09:30:19    37.5
Freq: S, Name: C, dtype: float64

TL;DR

Use groupby.apply as a workaround instead after setting the index appropriately:

# tested in version - 0.19.1
df.groupby('A').apply(lambda grp: grp.resample('s').ffill(limit=0).rolling(4)['C'].mean())

(Or)

# Tested in OP's version - 0.19.0
df.groupby('A').apply(lambda grp: grp.resample('s').ffill().rolling(4)['C'].mean())

Both work.

Time-based .rolling() fails with group by

Tags:

python

pandas

group-by

Huey

1 Answers

Nickil Maveli

Recent Activity

Donate For Us

Time-based .rolling() fails with group by

Tags:

python

pandas

group-by

Huey

1 Answers

Nickil Maveli

Related questions

Recent Activity

Donate For Us