Calculating Autocorrelation of Pandas DataFrame along each Column

Tags:

I want to calculate the autocorrelation coefficients of lag length one among columns of a Pandas DataFrame. A snippet of my data is:

            RF        PC         C         D        PN        DN         P
year                                                                      
1890       NaN       NaN       NaN       NaN       NaN       NaN       NaN
1891 -0.028470 -0.052632  0.042254  0.081818 -0.045541  0.047619 -0.016974
1892 -0.249084  0.000000  0.027027  0.067227  0.099404  0.045455  0.122337
1893  0.653659  0.000000  0.000000  0.039370 -0.135624  0.043478 -0.142062

Along year, I want to calculate autocorrelations of lag one for each column (RF, PC, etc...).

To calculate the autocorrelations, I extracted two time series for each column whose start and end data differed by one year and then calculated correlation coefficients with numpy.corrcoef.

For example, I wrote:

numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])

(the entire DataFrame is called data).
However, the command unfortunately returned:

array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ..., 
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])

Can somebody kindly advise me on how to calculate autocorrelations?

490

asked Sep 28 '14 09:09

fabian

1 Answers

.autocorr applies to Series, not DataFrames. You can use .apply to apply to a DataFrame:

def df_autocorr(df, lag=1, axis=0):
    """Compute full-sample column-wise autocorrelation for a DataFrame."""
    return df.apply(lambda col: col.autocorr(lag), axis=axis)
d1 = DataFrame(np.random.randn(100, 6))

df_autocorr(d1)
Out[32]: 
0    0.141
1   -0.028
2   -0.031
3    0.114
4   -0.121
5    0.060
dtype: float64

You could also compute rolling autocorrelations with a specified window as follows (this is what .autocorr is doing under the hood):

def df_rolling_autocorr(df, window, lag=1):
    """Compute rolling column-wise autocorrelation for a DataFrame."""

    return (df.rolling(window=window)
        .corr(df.shift(lag))) # could .dropna() here

df_rolling_autocorr(d1, window=21).dropna().head()
Out[38]: 
        0      1      2      3      4      5
21 -0.173 -0.367  0.142 -0.044 -0.080  0.012
22  0.015 -0.341  0.250 -0.036  0.023 -0.012
23  0.038 -0.329  0.279 -0.026  0.075 -0.121
24 -0.025 -0.361  0.319  0.117  0.031 -0.120
25  0.119 -0.320  0.181 -0.011  0.038 -0.111

122

answered Sep 28 '22 07:09

Brad Solomon

Related questions
                            
                                Multiple async requests simultaneously
                            
                                "Import could not be resolved" reported by Pyright
                            
                                Is there a good Python GUI shell?
                            
                                Which version of python added the else clause for for loops?
                            
                                Writing a key-value store
                            
                                Crop a PNG image to its minimum size
                            
                                Grab a line's whitespace/indention with Python
                            
                                How do I set up a Python development environment on Linux?
                            
                                How to stub Python methods without Mock
                            
                                Django Can't Find My Templates
                            
                                os.walk multiple directories at once [duplicate]
                            
                                Python - Getting all links from a div having a class
                            
                                Python email module: form header "From" with some unicode name + email
                            
                                Python: pickling nested functions
                            
                                Python - tuple unpacking in dict comprehension
                            
                                How can I loop through an IP address range in python
                            
                                variable number of digit in format string
                            
                                How to rename a pandas Series?
                            
                                python opencv cv2.waitkey error
                            
                                Pymongo find and modify

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Calculating Autocorrelation of Pandas DataFrame along each Column

Tags:

python

pandas

numpy

fabian

People also ask

1 Answers

Brad Solomon

Recent Activity

Donate For Us