ACF confidence intervals in R vs python: why are they different?

Question

When I use the acf function in R it plots horizontal lines that represent the confidence interval (95% by default) for the autocorrelations at various lags: R ACF

However, when I use statsmodels.graphics.tsaplots.plot_acf in python I see a curved confidence interval based on a more sophisticated computation: python ACF

Notice that in the R version, the lags up through lag 25 are considered significant. For the same data, in the python version, only the lags up through 20 are considered significant.

What is the difference between these two methods, and which one should I trust more? Can someone explain the theory of the non-constant confidence interval computed by statsmodels.tsa.stattools.acf?

I know I can reproduce the R horizontal lines by simply plotting something like y=[+/-]1.96 / np.sqrt(len(data)). However, I'd like to understand the fancy curved confidence interval.

devin-wang · Accepted Answer

It has been shown that the autocorrelation coefficient r(k) follows a Gaussian distribution with variance Var(r(k)).

As you have found, in R, the variance is simply calculated as Var(r(k)) = 1/N for all k. While, in python, the variance is calculated using Bartlett’s formula, where Var(r(k)) = 1/N (1 + 2(r(1)^2+r(2)^2+...+r(k-1)^2)). This results in the first increasing, then flattening confidence level shown above.

Source code of ACF variances in python:

varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)

These two distinct formulas are based on different assumptions. The former assumes an i.i.d process and r(k) = 0 for all k != 0, while the later assumes a MA process with order of k-1 where ACF "cuts tail" after lag k.

Ben Bolker · Answer

Not really an answer to the theory part of this (which might be better on CrossValidated), but maybe useful ... ?

If you go to the documentation page for statsmodels.tsa.stattools.acf it gives you an option to browse the source code. The code there is:

varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)
interval = stats.norm.ppf(1 - alpha / 2.) * np.sqrt(varacf)
confint = np.array(lzip(acf - interval, acf + interval))

In contrast, the R source code for plot.acf shows

clim0 <- if (with.ci) qnorm((1 + ci)/2)/sqrt(x$n.used) else c(0, 0)

where ci is the confidence level (default=0.95).

ACF confidence intervals in R vs python: why are they different?

Tags:

python

r

statistics

time-series

eraoul

2 Answers

devin-wang

Ben Bolker

Recent Activity

Donate For Us

ACF confidence intervals in R vs python: why are they different?

Tags:

python

r

statistics

time-series

eraoul

2 Answers

devin-wang

Ben Bolker

Related questions

Recent Activity

Donate For Us