Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding output from statsmodels grangercausalitytests

I'm new to Granger Causality and would appreciate any advice on understanding/interpreting the results of the python statsmodels output. I've constructed two data sets (sine functions shifted in time with noise added)enter image description here

and put them in a "data" matrix with signal 1 as the first column and signal 2 as the second. I then ran the tests using:

granger_test_result = sm.tsa.stattools.grangercausalitytests(data, maxlag=40, verbose=True)`

The results showed that the optimal lag (in terms of the highest F test value) were for a lag of 1.

Granger Causality
('number of lags (no zero)', 1)
ssr based F test:         F=96.6366 , p=0.0000  , df_denom=995, df_num=1
ssr based chi2 test:   chi2=96.9280 , p=0.0000  , df=1
likelihood ratio test: chi2=92.5052 , p=0.0000  , df=1
parameter F test:         F=96.6366 , p=0.0000  , df_denom=995, df_num=1

However, the lag that seems to best describe the optimal overlap of the data is around 25 (in the figure below, signal 1 has been shifted to the right by 25 points): enter image description here

Granger Causality
('number of lags (no zero)', 25)
ssr based F test:         F=4.1891  , p=0.0000  , df_denom=923, df_num=25
ssr based chi2 test:   chi2=110.5149, p=0.0000  , df=25
likelihood ratio test: chi2=104.6823, p=0.0000  , df=25
parameter F test:         F=4.1891  , p=0.0000  , df_denom=923, df_num=25

I'm clearly misinterpreting something here. Why wouldn't the predicted lag match up with the shift in the data?

Also, can anyone explain to me why the p-values are so small as to be negligible for most lag values? They only begin to show up as non-zero for lags greater than 30.

Thanks for any help you can give.

like image 319
Wilhelm Avatar asked Aug 09 '18 17:08

Wilhelm


People also ask

How do you interpret a Granger causality test in Python?

Granger Causality Test The row are the response (y) and the columns are the predictors (x). If a given p-value is < significance level (0.05), for example, take the value 0.0 in (row 1, column 2), we can reject the null hypothesis and conclude that walmart_x Granger causes apple_y.

What is Granger causality test used for?

The Granger causality test is a statistical hypothesis test for determining whether one time series is useful for forecasting another. If probability value is less than any level, then the hypothesis would be rejected at that level.

What is lag in Granger causality?

The R function is: granger. test(y, p) , where y is a data frame or matrix, and p is the lags. The null hypothesis is that the past p values of X do not help in predicting the value of Y.


1 Answers

As stated here, in order to run a Granger Causality test, the time series' you are using must be stationary. A common way to achieve this is to transform both series by taking the first difference of each:

x = np.diff(x)[1:]
y = np.diff(y)[1:]

Here is the comparison of Granger Causality results at lag 1 and lag 25 for the similar dataset I generated:

Unchanged

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=19.8998 , p=0.0000  , df_denom=221, df_num=1
ssr based chi2 test:   chi2=20.1700 , p=0.0000  , df=1
likelihood ratio test: chi2=19.3129 , p=0.0000  , df=1
parameter F test:         F=19.8998 , p=0.0000  , df_denom=221, df_num=1

Granger Causality
number of lags (no zero) 25
ssr based F test:         F=6.9970  , p=0.0000  , df_denom=149, df_num=25
ssr based chi2 test:   chi2=234.7975, p=0.0000  , df=25
likelihood ratio test: chi2=155.3126, p=0.0000  , df=25
parameter F test:         F=6.9970  , p=0.0000  , df_denom=149, df_num=25

1st Difference

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=0.1279  , p=0.7210  , df_denom=219, df_num=1
ssr based chi2 test:   chi2=0.1297  , p=0.7188  , df=1
likelihood ratio test: chi2=0.1296  , p=0.7188  , df=1
parameter F test:         F=0.1279  , p=0.7210  , df_denom=219, df_num=1

Granger Causality
number of lags (no zero) 25
ssr based F test:         F=6.2471  , p=0.0000  , df_denom=147, df_num=25
ssr based chi2 test:   chi2=210.3621, p=0.0000  , df=25
likelihood ratio test: chi2=143.3297, p=0.0000  , df=25
parameter F test:         F=6.2471  , p=0.0000  , df_denom=147, df_num=25

I'll try to explain what is happening conceptually. Due to the series you are using having a clear trend in mean, the early lags at 1, 2, ... etc. all give significant predictive models in the F test. This is because you can negatively correlate the x values 1 lag a way with the y values very easily, due to the long term trend. Additionally (this one is more of an educated guess), I think the reason you see the F statistic for lag 25 very low compared to the early lags is that a lot of the variance explained by the x series is contained in the auto-correlation of y from lags 1-25, since the non-stationarity gives the auto correlation more predictive power.

like image 75
rsmith49 Avatar answered Sep 28 '22 03:09

rsmith49