In statsmodels, for the SARIMAX or ARIMA model, I would like to use more than one additional external variable (exogenous variables). E.g. I want to predict yield at time t using AR of lag 3 for the yield time series and AR of lag 4 with weather temperature time series and another variable of market price with AR of lag 3. It doesn't appear possible? Any examples or explanation of how this can be done?
First of all you have to define your exogenous input as an array-type structure with dimensions nobsxk where nobs is the number of your endogenous observations (i.e. supposing that you have a time series, the length of your time series) and k the number of your additional exogenous variables. Supposing that you use a ndarray for this purpose you may begin with something like
exog = np.empty([nobs, k])
and then fill it with the values of your exogenous variables. Then, you define your model as in the following example:
model = sm.tsa.SARIMAX(endog=series, exog=exog, order=order, seasonal_order=seasonal_order).fit(start_params=[0, 0, 0, 0, 0, 1])
where series is your original time series, exog the exogenous input, order a (p,d,q) tuple and seasonal_order a (P,D,Q,s) tuple. You should pay attention to the start_params list which I found essential for successfully building the sarimax model in my case.
When I did not use any exogenous input, the start_params list was start_params = [0, 0, 0, 1] for (p,d,q) = (1,0,0) and (P,D,Q,s) = (1,0,0,37).
When I added 3 new exogenous inputs, I set the start_params list to start_params = [0, 0, 0, 0, 1, 1] which if you notice has 2 additional elements.
I suppose (I do not know for sure neither checked it thoroughly) that if you add k exogenous inputs in your model you have to add k - 1 additional elements in your start_params list in order to build the sarimax model successfully.
Hope it helps. Cheers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With