Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ARIMA modeling on time-series dataframe python

I'm trying to use ARIMA model for forecasting. I'm new to it. I have tried to plot seasonal_decompose() of my data-set (hourly data), below is the plot?

enter image description here

I want to understand these plots, brief description will be helpful. I see that there is no trend initially and after some time there is an upward trend. I'm not sure if I'm saying this right? I want to understand how to read these graphs properly. Please give some good description.

When I'm trying to apply Dickey-Fuller test to check if my data is stationary or not and I need further differencing or not, I got the below results:

Test Statistic                   -4.117543
p-value                           0.000906
Lags Used                       30.000000
Number of Observations Used    4289.000000
Critical Value (1%)              -3.431876
Critical Value (5%)              -2.862214
Critical Value (10%)             -2.567129

I'm referring 2 links to understand this : http://www.seanabu.com/2016/03/22/time-series-seasonal-ARIMA-model-in-python/

this link says when test-statistic is greater than critical value, it means that data is stationary; on the other hand the other link says vice versa. I'm confused on this also I referred otexts.org it says we should check on the basis of p-value. Please suggest how do I interpret results given by ADF test?

Also, when I tried to apply ARIMA model on dataset:

from statsmodels.tsa.arima_model import ARIMA
model = ARIMA(df.y, order=(0,1,0))
model_fit = model.fit()

My dataframe has datetime column as index and y column has float values. When I'm applying model on this dataframe. I'm getting error of this sort:

IndexError: list index out of range.

This error is coming when I'm trying to print the summary of model using :

print(model_fit.summary())

Please help me with this. So that I can get better understanding of ARIMA.

like image 767
Ashag Avatar asked Dec 18 '22 06:12

Ashag


1 Answers

Cross validation for ARIMA (AutoRegressive Integrated Moving Average) time series: K-fold cross validation does not work for time-series. Instead, use backtesting techniques like walk-forward and rolling windows.

K-fold cross-validation for autoregression: Although cross-validation is (usually) not valid for time series (ARIMA) models, K-fold works for autoregressions as long as the models considered have uncorrelated errors, and you have tested it with the Ljung Box Test, for XAI (Explainable Artificial Intelligence) in time series use cases.

There are a few Python statistics libs that have these methods avail, here are two: Python Stats Tests and Python StatsModels.

To get the diff of values, you can simply enforce int8's using Python 3.6+ PEP 487 Descriptors, where you can enforce a type list that always returns int8's, for faster computation as well (list : list -> list of ints):

list_a = [1,2,3]
list_b = [2,3]
print(set(list_a).difference(set(list_b)))
`answer is` set([1])
like image 174
joe hoeller Avatar answered Dec 28 '22 08:12

joe hoeller