What's the difference between pandas ACF and statsmodel ACF?

Tags:

I'm calculating the Autocorrelation Function for a stock's returns. To do so I tested two functions, the autocorr function built into Pandas, and the acf function supplied by statsmodels.tsa. This is done in the following MWE:

import pandas as pd from pandas_datareader import data import matplotlib.pyplot as plt import datetime from dateutil.relativedelta import relativedelta from statsmodels.tsa.stattools import acf, pacf  ticker = 'AAPL' time_ago = datetime.datetime.today().date() - relativedelta(months = 6)  ticker_data = data.get_data_yahoo(ticker, time_ago)['Adj Close'].pct_change().dropna() ticker_data_len = len(ticker_data)  ticker_data_acf_1 =  acf(ticker_data)[1:32] ticker_data_acf_2 = [ticker_data.autocorr(i) for i in range(1,32)]  test_df = pd.DataFrame([ticker_data_acf_1, ticker_data_acf_2]).T test_df.columns = ['Pandas Autocorr', 'Statsmodels Autocorr'] test_df.index += 1 test_df.plot(kind='bar')

What I noticed was the values they predicted weren't identical:

enter image description here

What accounts for this difference, and which values should be used?

313

asked Mar 16 '16 14:03

BML91

1 Answers

The difference between the Pandas and Statsmodels version lie in the mean subtraction and normalization / variance division:

autocorr does nothing more than passing subseries of the original series to np.corrcoef. Inside this method, the sample mean and sample variance of these subseries are used to determine the correlation coefficient
acf, in contrary, uses the overall series sample mean and sample variance to determine the correlation coefficient.

The differences may get smaller for longer time series but are quite big for short ones.

Compared to Matlab, the Pandas autocorr function probably corresponds to doing Matlabs xcorr (cross-corr) with the (lagged) series itself, instead of Matlab's autocorr, which calculates the sample autocorrelation (guessing from the docs; I cannot validate this because I have no access to Matlab).

See this MWE for clarification:

import numpy as np import pandas as pd from statsmodels.tsa.stattools import acf import matplotlib.pyplot as plt plt.style.use("seaborn-colorblind")  def autocorr_by_hand(x, lag):     # Slice the relevant subseries based on the lag     y1 = x[:(len(x)-lag)]     y2 = x[lag:]     # Subtract the subseries means     sum_product = np.sum((y1-np.mean(y1))*(y2-np.mean(y2)))     # Normalize with the subseries stds     return sum_product / ((len(x) - lag) * np.std(y1) * np.std(y2))  def acf_by_hand(x, lag):     # Slice the relevant subseries based on the lag     y1 = x[:(len(x)-lag)]     y2 = x[lag:]     # Subtract the mean of the whole series x to calculate Cov     sum_product = np.sum((y1-np.mean(x))*(y2-np.mean(x)))     # Normalize with var of whole series     return sum_product / ((len(x) - lag) * np.var(x))  x = np.linspace(0,100,101)  results = {} nlags=10 results["acf_by_hand"] = [acf_by_hand(x, lag) for lag in range(nlags)] results["autocorr_by_hand"] = [autocorr_by_hand(x, lag) for lag in range(nlags)] results["autocorr"] = [pd.Series(x).autocorr(lag) for lag in range(nlags)] results["acf"] = acf(x, unbiased=True, nlags=nlags-1)  pd.DataFrame(results).plot(kind="bar", figsize=(10,5), grid=True) plt.xlabel("lag") plt.ylim([-1.2, 1.2]) plt.ylabel("value") plt.show()

enter image description here

Statsmodels uses np.correlate to optimize this, but this is basically how it works.

170

answered Sep 22 '22 14:09

nikhase

Related questions
                            
                                Name of a function returning a generator
                            
                                Scrapy Python Set up User Agent
                            
                                Necessity of explicit cursor.close()
                            
                                "Too many indexers" with DataFrame.loc
                            
                                Airbnb Airflow vs Apache Nifi [closed]
                            
                                Does get_or_create() have to save right away? (Django)
                            
                                Commit in git only if tests pass
                            
                                Why does pandas apply calculate twice
                            
                                How to use gettext with python >3.6 f-strings
                            
                                Nodejs: Where or How to write complicated business logic?
                            
                                Numpy quirk: Apply function to all pairs of two 1D arrays, to get one 2D array
                            
                                Cyclic module dependencies and relative imports in Python
                            
                                Pip install forked github-repo
                            
                                How is the feature score(/importance) in the XGBoost package calculated?
                            
                                Angle between points?
                            
                                Python's equivalent for R's dput() function
                            
                                Python >=3.5: Checking type annotation at runtime
                            
                                104, 'Connection reset by peer' socket error, or When does closing a socket result in a RST rather than FIN?
                            
                                Plotting results of Pandas GroupBy
                            
                                Closest equivalent of a factor variable in Python Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between pandas ACF and statsmodel ACF?

Tags:

python

pandas

statsmodels

BML91

People also ask

1 Answers

nikhase

Recent Activity

Donate For Us