I followed the tutorial to study the SARIMAX model: https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3. The date range of data is 1958-2001.
mod = sm.tsa.statespace.SARIMAX(y,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
when are fitting an ARIMA Time Series Model, I found the author all date range data to fit parameter of model. But when validating Forecasts, the author used date started from 1998-01-01 as one part of date range of data for fitting model.
pred = results.get_prediction(start=pd.to_datetime('1998-01-01'), dynamic=False)
I know in machine learning model, the training data and validation(test) data is different, I mean different range. I mean the author is right? why do like this(I mean the reason touse all train data), I a new one to SARIMAX model.
Could you guys tell me more about this model, for example how about predict days or weeks not just month, I mean how to set the parameter of order=(1,1,1), seasonal_order=(1, 1, 1, 12). Thanks!
The author is right. When you do a regression (linear, higher-order or logistic - doesn't matter) - it is absolutely ok to have deviations from your training data (for instance - logistic regression even on training data may give you a false positive).
Same stands for time series. I think this way the author wanted to show that the model is built correctly.
seasonal_order=(1, 1, 1, 12)
If you look at tsa stats documentation you will see that if you want to operate with quarterly data - you have to assign the last parameter (s) - value of 4. Monthly - 12. It means that if you want to operate with weekly data seasonal_order should look like this
seasonal_order=(1, 1, 1, 52)
daily data will be
seasonal_order=(1, 1, 1, 365)
order component is the parameter that is responsible for non-seasonal parameters p, d and q respectively. You have to find them depending on your data behaviour
Here is a good answer how you can find non-seasonal component values
The author of the blog set those parameters because: "The output of our code suggests that SARIMAX(1, 1, 1)x(1, 1, 1, 12) yields the lowest AIC."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With