I want to calculate the autocorrelation coefficients of lag length one among columns of a Pandas DataFrame. A snippet of my data is:
RF PC C D PN DN P
year
1890 NaN NaN NaN NaN NaN NaN NaN
1891 -0.028470 -0.052632 0.042254 0.081818 -0.045541 0.047619 -0.016974
1892 -0.249084 0.000000 0.027027 0.067227 0.099404 0.045455 0.122337
1893 0.653659 0.000000 0.000000 0.039370 -0.135624 0.043478 -0.142062
Along year, I want to calculate autocorrelations of lag one for each column (RF, PC, etc...).
To calculate the autocorrelations, I extracted two time series for each column whose start and end data differed by one year and then calculated correlation coefficients with numpy.corrcoef
.
For example, I wrote:
numpy.corrcoef(data[['C']][1:-1],data[['C']][2:])
(the entire DataFrame is called data
).
However, the command unfortunately returned:
array([[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]])
Can somebody kindly advise me on how to calculate autocorrelations?
Initialize two variables, col1 and col2, and assign them the columns that you want to find the correlation of. Find the correlation between col1 and col2 by using df[col1]. corr(df[col2]) and save the correlation value in a variable, corr. Print the correlation value, corr.
The number of autocorrelations calculated is equal to the effective length of the time series divided by 2, where the effective length of a time series is the number of data points in the series without the pre-data gaps. The number of autocorrelations calculated ranges between a minimum of 2 and a maximum of 400.
ACF: In practice, a simple procedure is:Estimate the sample mean: ˉy=∑Tt=1ytT. Calculate the sample autocorrelation: ^ρj=∑Tt=j+1(yt−ˉy)(yt−j−ˉy)∑Tt=1(yt−ˉy)2. Estimate the variance.
.autocorr
applies to Series, not DataFrames. You can use .apply
to apply to a DataFrame:
def df_autocorr(df, lag=1, axis=0):
"""Compute full-sample column-wise autocorrelation for a DataFrame."""
return df.apply(lambda col: col.autocorr(lag), axis=axis)
d1 = DataFrame(np.random.randn(100, 6))
df_autocorr(d1)
Out[32]:
0 0.141
1 -0.028
2 -0.031
3 0.114
4 -0.121
5 0.060
dtype: float64
You could also compute rolling autocorrelations with a specified window as follows (this is what .autocorr is doing under the hood):
def df_rolling_autocorr(df, window, lag=1):
"""Compute rolling column-wise autocorrelation for a DataFrame."""
return (df.rolling(window=window)
.corr(df.shift(lag))) # could .dropna() here
df_rolling_autocorr(d1, window=21).dropna().head()
Out[38]:
0 1 2 3 4 5
21 -0.173 -0.367 0.142 -0.044 -0.080 0.012
22 0.015 -0.341 0.250 -0.036 0.023 -0.012
23 0.038 -0.329 0.279 -0.026 0.075 -0.121
24 -0.025 -0.361 0.319 0.117 0.031 -0.120
25 0.119 -0.320 0.181 -0.011 0.038 -0.111
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With