Aggregating/resampling a pandas multiindex dataframe over many timeframes and forecasting ARIMA

Question

I have multiple timeframes I want to track and predict values for (ARIMA forecasting 1 period ahead per timeframe), and my algorithm retrieves data on the lowest timeframe. Note: each timeframe is evenly (but not necessarily symmetrically) divisible by lower timeframes. As an important side note, intraday timeframes (1H, 4H) should start at 9:30am (market open time).

1min, 5min, 15min, 30min, 1H, 4H, 1D, 1W, 1M

First, the algorithm does a bulk history call and gets all 1-minute data within a given timespan (warm up data). Then during the course of its runtime (never stops; it's a trading algorithm), it receives data every minute.

The first step (after receiving bulk historical data) is to aggregate it for each respective higher timeframe using a smaller and smaller dataset to save processing time:

i.e.,

1min timeframe data = 1min data (historical bulk data)
5min timeframe data = aggregated 1min timeframe data to 5min
15min timeframe data = aggregated 5min timeframe data to 15min
etc...

Given the following example dataset (and assume there could be other symbols included):

                                 close        high         low        open
symbol time                                                                  
SPY    2018-05-22 09:31:00  270.245900  270.374166  270.226167  270.305100   
       2018-05-22 09:32:00  270.344566  270.374166  270.206434  270.245900   
       2018-05-22 09:33:00  270.374166  270.374166  270.314966  270.344566   
       2018-05-22 09:34:00  270.275500  270.374166  270.245900  270.374166   
       2018-05-22 09:35:00  270.413632  270.443232  270.275500  270.275500   
       2018-05-22 09:36:00  270.502431  270.522165  270.384032  270.413632   
       2018-05-22 09:37:00  270.541898  270.591231  270.492565  270.502431

Q1: How do I aggregate a multi-index pandas dataframe such that the time index and columns are aggregated per symbol? A partial untested (and likely not working) solution illustrating my intent follows:

# Where timeframe.Frequency is a string such as "1H" or "1min"
df.resample(timeframe.Frequency).agg({"open": "first", "close": "last", "low": "min", "high": "max"})

A1: df.groupby(['symbol', pd.Grouper(freq=timeframe.Frequency, level='time')]).agg({'open':'first', 'close':'last', 'high':'max', 'low':'min'})

Q2: Given that there will be partially completed 'bars' for higher timeframes (ex., 5min timeframe data will contain @9:35am full bar data and @9:37am partial bar data represented as if it were @9:40am), is it wise to have partial data in a given timeframe's dataset, or should I use QuantConnect's consolidator to complete the bar before adding it to the dataset during runtime? How do I account for this partial bar data when processing history?

A2: Partial bars are indeed created when resampling. For example, I may have a single 1 minute data point after June 1st, and my monthly timeframe dataframe will have a June 30th entry with the OHLC values of the 1 minute bar. At this point I can either remove the row or just keep updating it with new values as data comes in.

Quang Hoang · Accepted Answer

For your first question, assuming that the data is indexed by symbol and time:

(df.groupby(['symbol', df.index.get_level_values(1).floor('5T')])
   .agg({"open": "first", 
         "close": "last", 
         "low": "min", 
         "high": "max"})
)

gives:

                                open        close       low         high
symbol  time                
SPY     2018-05-22 09:30:00     270.305     270.276     270.206     270.374
SPY     2018-05-22 09:35:00     270.276     270.542     270.276     270.591

The second question is too qualitative and much depend on your data as well as sampling windows.

Aggregating/resampling a pandas multiindex dataframe over many timeframes and forecasting ARIMA

Tags:

python

pandas

dataframe

time-series

arima

SnakeWasTheNameTheyGaveMe

1 Answers

Quang Hoang

Recent Activity

Donate For Us

Aggregating/resampling a pandas multiindex dataframe over many timeframes and forecasting ARIMA

Tags:

python

pandas

dataframe

time-series

arima

SnakeWasTheNameTheyGaveMe

1 Answers

Quang Hoang

Related questions

Recent Activity

Donate For Us