I am new to Python ARIMA implementation. I have a data at 15 min frequency for few months. In my attempt to follow the Box-Jenkins method to fit a timeseries model. I ran into an issue towards the end. The ACF-PACF graph for the time series (ts) and the difference series (ts_diff) are given. I used ARIMA (5,1,2) and finally I plotted the fitted values(green) and original values(blue). As you can from figure, there is a clear shift(by one) in values. What am I doing wrong?
Is the prediction bad? Any insight will be helpful.
1- Check again the stationarity of the time series using augmented Dickey-Fuller (ADF) test. 2- Try to increase the number of predictors ( independent variables). 3- Try to increase the sample size (in case of monthly data, to use at least 4 years data.
Auto-Regressive Integrated Moving Average (ARIMA) is a time series model that identifies hidden patterns in time series values and makes predictions. For example, an ARIMA model can predict future stock prices after analyzing previous stock prices. Also, an ARIMA model assumes that the time series data is stationary.
This thing happen when your historical data doesn't have strong seasonality and the forecasting model finds difficult to predict the future data points there fore it simply take average of your previous values and predict as future. There fore you are getting straight line.
This is a standard property of one-step ahead prediction or forecasting.
The information used for the forecast is the history up to and including the previous period. A peak, for example, at a period will affect the forecast for the next period, but cannot influence the forecast for the peak period. This makes the forecasts appear shifted in the plot.
A two-step ahead forecast would give the impression of a shift by two periods.
Just to confirm, I am doing this right then? Here is the code I used.
from statsmodels.tsa.arima_model import ARIMA
model = sm.tsa.ARIMA(ts, order=(5, 1, 2))
model = model.fit()
results_ARIMA=model.predict(typ='levels')
concatenated = pd.concat([ts, results_ARIMA], axis=1, keys=['original', 'predicted'])
concatenated.head(10)
original predicted
login_time
1970-01-01 20:00:00 2 NaN
1970-01-01 20:15:00 6 2.000186
1970-01-01 20:30:00 9 4.552971
1970-01-01 20:45:00 7 7.118973
1970-01-01 21:00:00 1 7.099769
1970-01-01 21:15:00 4 3.624975
1970-01-01 21:30:00 0 3.867454
1970-01-01 21:45:00 4 1.618120
1970-01-01 22:00:00 9 2.997275
1970-01-01 22:15:00 8 6.300015
In the model you specify (5, 1, 2), you set d = 1. This means that you are differencing the data by 1, or in other words, performing a shift of your entire range of time-related observations so as to minimize the residuals of the fitted model.
Sometimes, setting d to 1 will result in a ACF / PACF plot with fewer and / or less dramatic spikes (i.e. less extreme residuals). In such cases, if you use the model you have fitted to predict future values, your predictions will deviate less dramatically from the observations you have if you apply differencing.
Differencing is accomplished through Y(differenced) = Y(t) - Y(t-d), where Y(t) refers to observed value Y at timeindex t, and d refers to the order of differencing you apply. When you use differencing, your entire range of observations basically shifts to the right. This means you lose some data at the left edge of your time series. How many time points you lose depends on the order of differencing d you use. This is where your observed shift comes from.
This page may offer a more elaborate explanation (make sure to click around a bit and explore the other pages on there if you want a treatment of the whole process of fitting an ARIMA model).
Hope this helps (or at least puts your mind at ease about the shift)!
Bests,
Evert
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With