Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this time series stationary or not?

I want to check the stationary of a time series data saved in TS.csv.

However, R's tseries::adf.test() and Python's statsmodels.tsa.stattools.adfuller() give completely different results.

adf.test() shows it's stationary (p < 0.05), while adfuller() shows it's non-stationary (p > 0.05).

Is there any problems in the following codes?

What's the right process to test stationary of a time series in R and Python?

Thanks.

R codes:

> rd <- read.table('Data/TS.csv', sep = ',', header = TRUE)
> inp <- ts(rd$Sales, frequency = 12, start = c(1965, 1))
> inp
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1965 154  96  73  49  36  59  95 169 210 278 298 245
1966 200 118  90  79  78  91 167 169 289 347 375 203
1967 223 104 107  85  75  99 135 211 335 460 488 326
1968 346 261 224 141 148 145 223 272 445 560 612 467
1969 518 404 300 210 196 186 247 343 464 680 711 610
1970 613 392 273 322 189 257 324 404 677 858 895 664
1971 628 308 324 248 272
> library(tseries)
> adf.test(inp)

    Augmented Dickey-Fuller Test

data:  inp
Dickey-Fuller = -7.2564, Lag order = 4, p-value = 0.01
alternative hypothesis: stationary

Python codes (from Time_Series.ipynb):

import pandas as pd
from statsmodels.tsa.stattools import adfuller
df = pd.read_csv('Data/TS.csv')
ts = pd.Series(list(df['Sales']), index=pd.to_datetime(df['Month'],format='%Y-%m'))
s_test = adfuller(ts, autolag='AIC')
print("p value > 0.05 means data is non-stationary: ", s_test[1])
# output: p value > 0.05 means data is non-stationary:  0.988889420517

Update

@gfgm give exellent explanations why results of R and Python are different, and how to make them the same by changing the parameters.

For the second quetsion above: "What's the right process to test stationary of a time series in R and Python?". I'd like to provide some details:

When forecast a time series, ARIMA model needs the input time series to be stationary. If the input isn't stationary, it should be log()ed or diff()ed to make it stationary, then fit it into the model.

So the problem is: should I think the input is stationary (with R's default parameters) and fit it directly into ARIMA model, or think it's non-stationary (with Python's default parameters), and make it stationary with extra functions (like log() or diff())?

like image 642
Leo Avatar asked Mar 27 '18 06:03

Leo


People also ask

How do you know if a time series is stationary?

If Test statistic < Critical Value and p-value < 0.05 – Reject Null Hypothesis(HO) i.e., time series does not have a unit root, meaning it is stationary. It does not have a time-dependent structure.

What makes a time series stationary?

Stationary Time Series Time series are stationary if they do not have trend or seasonal effects. Summary statistics calculated on the time series are consistent over time, like the mean or the variance of the observations.

What is a stationary time series example?

A stationary process' distribution does not change over time. An intuitive example: you flip a coin. 50% heads, regardless of whether you flip it today or tomorrow or next year. A more complex example: by the efficient market hypothesis, excess stock returns should always fluctuate around zero.

Which process is not stationary?

Examples of non-stationary processes are random walk with or without a drift (a slow steady change) and deterministic trends (trends that are constant, positive, or negative, independent of time for the whole life of the series).


2 Answers

The results are different because the models being fit are slightly different and because the lag order of the models are completely different. The python test includes a constant 'drift' term (a constant is estimated thus centering the time series at zero), but the R test includes both a constant and a linear trend term. This can be specified in the python code with the argument regression = 'ct'.

Default lag length in r

nlag = trunc((length(x)-1)^(1/3))

Default lag length in python

12*(nobs/100)^(1/4)

When you ran the python code you told the function to pick optimal lag-length by AIC criteria. If we tell python to run a centered and detrended model, and we tell it to use the R lag-length criteria, we get:

In [5]: adfuller(ts, regression="ct", maxlag = 4)[1]
Out[5]: 3.6892966741832268e-09

It's hard to see if this is identical to the R result, as R rounds its p-value to .01, but we can tell R to use python's lag length, and python to use R's model (I cant change model in R with this function). We get:

adf.test(inp, k = ceiling(12*(length(inp)/100)^(1/4)))

    Augmented Dickey-Fuller Test

data:  inp
Dickey-Fuller = -2.0253, Lag order = 12, p-value = 0.5652
alternative hypothesis: stationary

And in python:

In [6]: adfuller(ts, regression="ct")[1]
Out[6]: 0.58756464088883864

Not perfect, but pretty close.

Note:

The actual Dickey-Fuller test-statistic for the python model is

In [8]: adfuller(ts, regression="ct")[0]
Out[8]: -2.025340637385288

which is identical to the R result. The tests probably use different ways of computing the p-value from the stat.

like image 164
gfgm Avatar answered Oct 28 '22 13:10

gfgm


The p-values of the Augmented Dickey-Fuller test are rather sensitive to the choice of lag order. For example, here is the same test in R with a higher lag order:

> adf.test(rd$Sales, k=9)

    Augmented Dickey-Fuller Test

data:  rd$Sales
Dickey-Fuller = -2.9186, Lag order = 9,
p-value = 0.2004
alternative hypothesis: stationary

The documentation for the adf.test says that it uses regression with a constant and linear trend. We should pass the parameter regression = 'ct' to adfuller to use the same regression method.

I'm having some trouble with statsmodels on my machine at the moment, but I suggest you try the following parameters and see if you get closer correspondence:

adfuller(a, maxlag=9, autolag=None, regression='ct')

What you want to look for is whether the two are showing the same test statistic because the p-values are determined differently between the two packages.

like image 35
znr Avatar answered Oct 28 '22 13:10

znr