Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas .rolling specifying time window and win_type

I want to compute a moving average using a time window over an irregular time series using pandas. Ideally, the window should be exponentially weighted using pandas.DataFrame.ewm, but the arguments (e.g. span) do not accept time-based windows. If we try to use pandas.DataFrame.rolling, we realise that we cannot combine time-based windows with win_type.

dft = pd.DataFrame({'B': [0, 1, 2, 3, 4]},
                   index = pd.Index([pd.Timestamp('20130101 09:00:00'),
                                     pd.Timestamp('20130101 09:00:02'),
                                     pd.Timestamp('20130101 09:00:03'),
                                     pd.Timestamp('20130101 09:00:05'),
                                     pd.Timestamp('20130101 09:00:06')],
                                    name='foo'))
dft.rolling('2s', win_types='triang').sum()
>>> ValueError: Invalid window 2s

How to calculate a not equally weighted time-based moving average over an irregular time series?

The expected output for dft.ewm(alpha=0.9, adjust=False).sum() associated with a window of '2s' would be [0*1, 1*1, 2*1+1*0.9, 3*1, 4*1+3*0.9]

like image 605
Elrond Avatar asked Nov 19 '17 13:11

Elrond


1 Answers

Pandas documentation is misleading. As you found out you can't pass an offset while using win_type. What you can do is pass your own function using .apply as a workaround. E.g., if you want to use triangle windows:

import pandas as pd
from scipy.signal.windows import triang

dft = pd.DataFrame(
    {"B": [0, 1, 2, 3, 4]},
    index=pd.Index(
        [
            pd.Timestamp("20130101 09:00:00"),
            pd.Timestamp("20130101 09:00:02"),
            pd.Timestamp("20130101 09:00:03"),
            pd.Timestamp("20130101 09:00:05"),
            pd.Timestamp("20130101 09:00:06"),
        ],
        name="foo",
    ),
)


def triangle_sum(window):
    weights = triang(len(window))
    return (weights * window).sum()


dft.rolling("2s").apply(triangle_sum, raw=True)

you can define your own weighting scheme and use Numba for performance, if that's a concern.

like image 93
onepan Avatar answered Sep 28 '22 15:09

onepan