Matlab's cross-correlation function xcorr(x,y,maxlags)
has an option maxlag
, which returns the cross-correlation sequence over the lag range [-maxlags:maxlags]
. Numpy's numpy.correlate(N,M,mode)
has three modes, but none of them allow me to set a specific lag, which is different from full (N+M-1)
, same (max(M, N))
or valid (max(M, N) - min(M, N) + 1 )
. For len(N) = 60000
, len (M) = 200
, I want to set the lag as 100.
matplotlib.xcorr
has the maxlags param. It is actually a wrapper of the numpy.correlate
, so there is no performance saving. Nevertheless it gives exactly the same result given by Matlab's cross-correlation function. Below I edited the code from maxplotlib so that it will return only the correlation. The reason is that if we use matplotlib.corr
as it is, it will return the plot as well. The problem is, if we put complex data type as the arguments into it, we will get "casting complex to real datatype" warning when matplotlib tries to draw the plot.
<!-- language: python -->
import numpy as np
import matplotlib.pyplot as plt
def xcorr(x, y, maxlags=10):
Nx = len(x)
if Nx != len(y):
raise ValueError('x and y must be equal length')
c = np.correlate(x, y, mode=2)
if maxlags is None:
maxlags = Nx - 1
if maxlags >= Nx or maxlags < 1:
raise ValueError('maxlags must be None or strictly positive < %d' % Nx)
c = c[Nx - 1 - maxlags:Nx + maxlags]
return c
This is my implementation of the lead-lag correlation, but it is limited to be 1-D and not guaranteed to be the best in terms of efficient. It uses the scipy.stats.pearsonr to the do the core computation, so also returned is the p value for the coefficient. Please modify to optimize based on this straw man.
def lagcorr(x,y,lag=None,verbose=True):
'''Compute lead-lag correlations between 2 time series.
<x>,<y>: 1-D time series.
<lag>: lag option, could take different forms of <lag>:
if 0 or None, compute ordinary correlation and p-value;
if positive integer, compute lagged correlation with lag
upto <lag>;
if negative integer, compute lead correlation with lead
upto <-lag>;
if pass in an list or tuple or array of integers, compute
lead/lag correlations at different leads/lags.
Note: when talking about lead/lag, uses <y> as a reference.
Therefore positive lag means <x> lags <y> by <lag>, computation is
done by shifting <x> to the left hand side by <lag> with respect to
<y>.
Similarly negative lag means <x> leads <y> by <lag>, computation is
done by shifting <x> to the right hand side by <lag> with respect to
<y>.
Return <result>: a (n*2) array, with 1st column the correlation
coefficients, 2nd column correpsonding p values.
Currently only works for 1-D arrays.
'''
import numpy
from scipy.stats import pearsonr
if len(x)!=len(y):
raise('Input variables of different lengths.')
#--------Unify types of <lag>-------------
if numpy.isscalar(lag):
if abs(lag)>=len(x):
raise('Maximum lag equal or larger than array.')
if lag<0:
lag=-numpy.arange(abs(lag)+1)
elif lag==0:
lag=[0,]
else:
lag=numpy.arange(lag+1)
elif lag is None:
lag=[0,]
else:
lag=numpy.asarray(lag)
#-------Loop over lags---------------------
result=[]
if verbose:
print '\n#<lagcorr>: Computing lagged-correlations at lags:',lag
for ii in lag:
if ii<0:
result.append(pearsonr(x[:ii],y[-ii:]))
elif ii==0:
result.append(pearsonr(x,y))
elif ii>0:
result.append(pearsonr(x[ii:],y[:-ii]))
result=numpy.asarray(result)
return result
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With