I have a multi-index dataframe in pandas, where index is on ID and timestamp. I want to be able to compute a time-series rolling sum of each ID but I can't seem to figure out how to do it without loops.
content = io.BytesIO("""\
IDs timestamp value
0 2010-10-30 1
0 2010-11-30 2
0 2011-11-30 3
1 2000-01-01 300
1 2007-01-01 33
1 2010-01-01 400
2 2000-01-01 11""")
df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)
pd.stats.moments.rolling_sum(df,window=2
And the output for this is:
value
IDs timestamp
0 2010-10-30 NaN
2010-11-30 3
2011-11-30 5
1 2000-01-01 303
2007-01-01 333
2010-01-01 433
2 2000-01-01 411
Notice the overlap between IDs 0 and 1 and 1 and 2 at the edges (I don't want that, messes up my calculations). One possible way to get around this is to use groupby on IDs and then loop through that groupby and then apply a rolling_sum.
I am sure there is a function to help me do this without using loops.
Group first, then roll the sum (also rolling_sum
is available in the top-level namespace)
In [18]: df.groupby(level='IDs').apply(lambda x: pd.rolling_sum(x,2))
Out[18]:
value
IDs timestamp
0 2010-10-30 NaN
2010-11-30 3
2011-11-30 5
1 2000-01-01 NaN
2007-01-01 333
2010-01-01 433
2 2000-01-01 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With