Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get lookback moving average of a timeseries with window based on date in numpy?

I have a timeseries like this:

                  times | data
1994-07-25 15:15:00.000 | 165
1994-07-25 16:00:00.000 | 165
1994-07-26 18:45:00.000 | 165

1994-07-27 15:15:00.000 | 165
1994-07-27 16:00:00.000 | 165

1994-07-28 18:45:00.000 | 165
1994-07-28 19:15:00.000 | 63
1994-07-28 20:35:00.000 | 64
1994-07-28 21:55:00.000 | 64

1994-07-29 14:15:00.000 | 62

1994-07-30 15:35:00.000 | 62
1994-07-30 16:55:00.000 | 61

I would like to do a lookback moving average on this data, but with a window based on date, not on rows or datetime.


For example, say lookback = 3 days, then for

1994-07-29 14:15:00.000 | 62

its lookback moving average value should be the average of

1994-07-26 18:45:00.000 | 165

1994-07-27 15:15:00.000 | 165
1994-07-27 16:00:00.000 | 165

1994-07-28 18:45:00.000 | 165
1994-07-28 19:15:00.000 | 63
1994-07-28 20:35:00.000 | 64
1994-07-28 21:55:00.000 | 64

Because it is a 3 days lookback, so the average will will starts from 1994-07-26 for 3 days, no matter how many rows within one day.


In addition, for multiple rows with the same date (not including time), their lookback moving average values should be the same.


How can I easily achieve that?

like image 355
Jackson Tale Avatar asked Sep 26 '22 08:09

Jackson Tale


1 Answers

I would use the pandas DatetimeIndex to accumulate the values for each date.

You can then use rolling_mean to calculate the average you require.

import numpy as np
import pandas
df = pandas.DataFrame({'times': np.array(['1994-07-25 15:15:00.000',
                                '1994-07-25 16:00:00.000', 
                                '1994-07-26 18:45:00.000', 
                                '1994-07-27 15:15:00.000', 
                                '1994-07-27 16:00:00.000', 
                                '1994-07-28 18:45:00.000', 
                                '1994-07-28 19:15:00.000', 
                                '1994-07-28 20:35:00.000', 
                                '1994-07-28 21:55:00.000', 
                                '1994-07-29 14:15:00.000', 
                                '1994-07-30 15:35:00.000', 
                                '1994-07-30 16:55:00.000'], dtype='datetime64'),
                       'data': [165,165,165,165,165,165,63,64,64,62,62,61]})
df = df.set_index('times')
g = df.groupby(df.index.date)
days = 3
pandas.rolling_mean(g.sum(), days)

This gives:

1994-07-25         NaN
1994-07-26         NaN
1994-07-27  275.000000
1994-07-28  283.666667
1994-07-29  249.333333
1994-07-30  180.333333

You might wish to play with the center and min_periods arguments on rolling_mean to get the exact results you want.

like image 199
pbarber Avatar answered Sep 28 '22 22:09

pbarber