bug of autocorrelation plot in matplotlib‘s plt.acorr?

Tags:

I am plotting autocorrelation with python. I used three ways to do it: 1. pandas, 2. matplotlib, 3. statsmodels. I found the graph I got from matplotlib is not consistent with the other two. The code is:

 from statsmodels.graphics.tsaplots import *
 # print out data
 print mydata.values

 #1. pandas
 p=autocorrelation_plot(mydata)
 plt.title('mydata')

 #2. matplotlib
 fig=plt.figure()
 plt.acorr(mydata,maxlags=150)
 plt.title('mydata')

 #3. statsmodels.graphics.tsaplots.plot_acf
 plot_acf(mydata)
 plt.title('mydata')

The graph is here: http://quant365.com/viewtopic.php?f=4&t=33

502

asked Dec 18 '14 07:12

Wu Fuheng

1 Answers

This is a result of different common definitions between statistics and signal processing. Basically, the signal processing definition assumes that you're going to handle the detrending. The statistical definition assumes that subtracting the mean is all the detrending you'll do, and does it for you.

First off, let's demonstrate the problem with a stand-alone example:

import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
from statsmodels.graphics import tsaplots

def label(ax, string):
    ax.annotate(string, (1, 1), xytext=(-8, -8), ha='right', va='top',
                size=14, xycoords='axes fraction', textcoords='offset points')

np.random.seed(1977)
data = np.random.normal(0, 1, 100).cumsum()

fig, axes = plt.subplots(nrows=4, figsize=(8, 12))
fig.tight_layout()

axes[0].plot(data)
label(axes[0], 'Raw Data')

axes[1].acorr(data, maxlags=data.size-1)
label(axes[1], 'Matplotlib Autocorrelation')

tsaplots.plot_acf(data, axes[2])
label(axes[2], 'Statsmodels Autocorrelation')

pd.tools.plotting.autocorrelation_plot(data, ax=axes[3])
label(axes[3], 'Pandas Autocorrelation')

# Remove some of the titles and labels that were automatically added
for ax in axes.flat:
    ax.set(title='', xlabel='')
plt.show()

enter image description here

So, why the heck am I saying that they're all correct? They're clearly different!

Let's write our own autocorrelation function to demonstrate what plt.acorr is doing:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()
    autocorr = np.correlate(x, x, mode='full')
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

If we plot this with our data, we'll get a more-or-less identical result to plt.acorr (I'm leaving out properly labeling the lags, simply because I'm lazy):

fig, ax = plt.subplots()
acorr(data)
plt.show()

enter image description here

This is a perfectly valid autocorrelation. It's all a matter of whether your background is signal processing or statistics.

This is the definition used in signal processing. The assumption is that you're going to handle detrending your data (note the detrend kwarg in plt.acorr). If you want it detrended, you'll explictly ask for it (and probably do something better than just subtracting the mean), and otherwise it shouldn't be assumed.

In statistics, simply subtracting the mean is assumed to be what you wanted to do for detrending.

All of the other functions are subtracting the mean of the data before the correlation, similar to this:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

fig, ax = plt.subplots()
acorr(data)
plt.show()

enter image description here

However, we still have one large difference. This one is purely a plotting convention.

In most signal processing textbooks (that I've seen, anyway), the "full" autocorrelation is displayed, such that zero lag is in the center, and the result is symmetric on each side. R, on the other hand, has the very reasonable convention to display only one side of it. (After all, the other side is completely redundant.) The statistical plotting functions follow the R convetion, and plt.acorr follows what Matlab does, which is the opposite convention.

Basically, you'd want this:

def acorr(x, ax=None):
    if ax is None:
        ax = plt.gca()

    x = x - x.mean()

    autocorr = np.correlate(x, x, mode='full')
    autocorr = autocorr[x.size:]
    autocorr /= autocorr.max()

    return ax.stem(autocorr)

fig, ax = plt.subplots()
acorr(data)
plt.show()

enter image description here

answered Sep 20 '22 06:09

Joe Kington

Related questions
                            
                                Python - How to change autopct text color to be white in a pie chart?
                            
                                How to pythonically have partially-mutually exclusive optional arguments?
                            
                                Add response headers to flask web app
                            
                                How to exit a loop in Python? [closed]
                            
                                selecting second child in beautiful soup
                            
                                Pyspark: Replacing value in a column by searching a dictionary
                            
                                Return a variable by name from a function in Python [duplicate]
                            
                                Python Dataframes: Describing a single column
                            
                                delete all items DynamoDB using Python
                            
                                Need to add an element at the start of an iterator in python
                            
                                How to check dimensions of all images in a directory using python?
                            
                                Generating dictionary keys on the fly [duplicate]
                            
                                what's the difference between python objects and json objects?
                            
                                Is there an easy way to make unicode work in python?
                            
                                checking if a string is in alphabetical order in python
                            
                                Confused with getattribute and setattribute in python
                            
                                gcc error when I'm trying to install readline-6.2
                            
                                Repeatedly failing to install scrapy and lxml
                            
                                Python 2.X adding single quotes around a string
                            
                                Draw a map of a specific country with cartopy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

bug of autocorrelation plot in matplotlib‘s plt.acorr?

Tags:

python

pandas

matplotlib

statsmodels

Wu Fuheng

People also ask

1 Answers

Joe Kington

Recent Activity

Donate For Us