I am using ets() and auto.arima() functions from forecast package to predict future values in R. Which criteria should be used to choose the best model between these two?
Following is the accuracy output from ets (data.ets) and auto.arima (data.ar).
> accuracy(data.ets)
ME RMSE MAE MPE MAPE MASE
0.6995941 4.1325246 3.2634246 0.5402465 2.7777897 0.5573740
> accuracy(data.ar)
ME RMSE MAE MPE MAPE MASE
-0.8215465 4.3640818 3.1070931 -0.7404200 2.5783128 0.5306735
and the AIC of each model are as follows
> ETSfit$aic
[1] 613.8103
> ARIMAfit$aic
[1] 422.5597
Following is the fitted model of both ets and auto.arima
> ETSfit
ETS(A,N,A)
Call:
ets(y = data.ts)
Smoothing parameters:
alpha = 0.5449
gamma = 1e-04
Initial states:
l = 95.8994
s=6.3817 -3.1792 6.8525 3.218 -3.4445 -1.2408
-4.5852 0.4434 1.7133 0.8123 -1.28 -5.6914
sigma: 4.1325
AIC AICc BIC
613.8103 620.1740 647.3326
> ARIMAfit
Series: data.ts
ARIMA(1,1,1)(0,1,1)[12]
Coefficients:
ar1 ma1 sma1
0.3808 -0.7757 -0.7276
s.e. 0.1679 0.1104 0.2675
sigma^2 estimated as 22.68: log likelihood=-207.28
AIC=422.56 AICc=423.19 BIC=431.44
Kindly help.
In this case the ETS model seems to be the slightly more accurate model based on the test set RMSE, MAPE and MASE. Notice that the ARIMA model fits the training data slightly better than the ETS model, but that the ETS model provides more accurate forecasts on the test set.
For ETS models, Akaike's Information Criterion (AIC) is defined as AIC=−2log(L)+2k, AIC = − 2 log ( L ) + 2 k , where L is the likelihood of the model and k is the total number of parameters and initial states that have been estimated (including the residual variance).
To select the best ARIMA model the data split into two periods, viz. estimation period and validation period. The model for which the values of criteria are smallest is considered as the best model. Hence, ARIMA (2, 1, and 2) is found as the best model for forecasting the SPL data series.
I found the only difference between ARIMA and Exponential smoothing model is the weight assignment procedure to its past lag values and error term. In that case Exponential should be considered much better that ARIMA due to its weight assigning method.
You are showing in-sample accuracy measures which are hard to compare without knowing how many parameters are in each model. Also, the AIC values are not comparable between these model classes.
The simplest approach is to use a test set that is not used for model selection or estimation, and then compare accuracy of the forecasts on the test set.
A more sophisticated version of that is to use time series cross-validation, as described at http://otexts.com/fpp/2/5/.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With