Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ACF confidence intervals in R vs python: why are they different?

When I use the acf function in R it plots horizontal lines that represent the confidence interval (95% by default) for the autocorrelations at various lags: R ACF

However, when I use statsmodels.graphics.tsaplots.plot_acf in python I see a curved confidence interval based on a more sophisticated computation: python ACF

Notice that in the R version, the lags up through lag 25 are considered significant. For the same data, in the python version, only the lags up through 20 are considered significant.

What is the difference between these two methods, and which one should I trust more? Can someone explain the theory of the non-constant confidence interval computed by statsmodels.tsa.stattools.acf?

I know I can reproduce the R horizontal lines by simply plotting something like y=[+/-]1.96 / np.sqrt(len(data)). However, I'd like to understand the fancy curved confidence interval.

like image 466
eraoul Avatar asked Nov 13 '16 20:11

eraoul


2 Answers

It has been shown that the autocorrelation coefficient r(k) follows a Gaussian distribution with variance Var(r(k)).

As you have found, in R, the variance is simply calculated as Var(r(k)) = 1/N for all k. While, in python, the variance is calculated using Bartlett’s formula, where Var(r(k)) = 1/N (1 + 2(r(1)^2+r(2)^2+...+r(k-1)^2)). This results in the first increasing, then flattening confidence level shown above.

Source code of ACF variances in python:

varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)

These two distinct formulas are based on different assumptions. The former assumes an i.i.d process and r(k) = 0 for all k != 0, while the later assumes a MA process with order of k-1 where ACF "cuts tail" after lag k.

like image 192
devin-wang Avatar answered Nov 06 '22 03:11

devin-wang


Not really an answer to the theory part of this (which might be better on CrossValidated), but maybe useful ... ?

If you go to the documentation page for statsmodels.tsa.stattools.acf it gives you an option to browse the source code. The code there is:

varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)
interval = stats.norm.ppf(1 - alpha / 2.) * np.sqrt(varacf)
confint = np.array(lzip(acf - interval, acf + interval))

In contrast, the R source code for plot.acf shows

clim0 <- if (with.ci) qnorm((1 + ci)/2)/sqrt(x$n.used) else c(0, 0)

where ci is the confidence level (default=0.95).

like image 23
Ben Bolker Avatar answered Nov 06 '22 04:11

Ben Bolker