Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating Autocorrelation of Pandas DataFrame along each Column

I want to calculate the autocorrelation coefficients of lag length one among columns of a Pandas DataFrame. A snippet of my data is:

            RF        PC         C         D        PN        DN         P
year                                                                      
1890       NaN       NaN       NaN       NaN       NaN       NaN       NaN
1891 -0.028470 -0.052632  0.042254  0.081818 -0.045541  0.047619 -0.016974
1892 -0.249084  0.000000  0.027027  0.067227  0.099404  0.045455  0.122337
1893  0.653659  0.000000  0.000000  0.039370 -0.135624  0.043478 -0.142062

Along year, I want to calculate autocorrelations of lag one for each column (RF, PC, etc...).

To calculate the autocorrelations, I extracted two time series for each column whose start and end data differed by one year and then calculated correlation coefficients with numpy.corrcoef.

For example, I wrote:

numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])

(the entire DataFrame is called data).
However, the command unfortunately returned:

array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       ..., 
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])

Can somebody kindly advise me on how to calculate autocorrelations?

like image 490
fabian Avatar asked Sep 28 '14 09:09

fabian


People also ask

How do you find the correlation between columns in pandas?

Initialize two variables, col1 and col2, and assign them the columns that you want to find the correlation of. Find the correlation between col1 and col2 by using df[col1]. corr(df[col2]) and save the correlation value in a variable, corr. Print the correlation value, corr.

How do you calculate autocorrelation?

The number of autocorrelations calculated is equal to the effective length of the time series divided by 2, where the effective length of a time series is the number of data points in the series without the pre-data gaps. The number of autocorrelations calculated ranges between a minimum of 2 and a maximum of 400.

How do you calculate ACF?

ACF: In practice, a simple procedure is:Estimate the sample mean: ˉy=∑Tt=1ytT. Calculate the sample autocorrelation: ^ρj=∑Tt=j+1(yt−ˉy)(yt−j−ˉy)∑Tt=1(yt−ˉy)2. Estimate the variance.


1 Answers

.autocorr applies to Series, not DataFrames. You can use .apply to apply to a DataFrame:

def df_autocorr(df, lag=1, axis=0):
    """Compute full-sample column-wise autocorrelation for a DataFrame."""
    return df.apply(lambda col: col.autocorr(lag), axis=axis)
d1 = DataFrame(np.random.randn(100, 6))

df_autocorr(d1)
Out[32]: 
0    0.141
1   -0.028
2   -0.031
3    0.114
4   -0.121
5    0.060
dtype: float64

You could also compute rolling autocorrelations with a specified window as follows (this is what .autocorr is doing under the hood):

def df_rolling_autocorr(df, window, lag=1):
    """Compute rolling column-wise autocorrelation for a DataFrame."""

    return (df.rolling(window=window)
        .corr(df.shift(lag))) # could .dropna() here

df_rolling_autocorr(d1, window=21).dropna().head()
Out[38]: 
        0      1      2      3      4      5
21 -0.173 -0.367  0.142 -0.044 -0.080  0.012
22  0.015 -0.341  0.250 -0.036  0.023 -0.012
23  0.038 -0.329  0.279 -0.026  0.075 -0.121
24 -0.025 -0.361  0.319  0.117  0.031 -0.120
25  0.119 -0.320  0.181 -0.011  0.038 -0.111
like image 122
Brad Solomon Avatar answered Sep 28 '22 07:09

Brad Solomon