Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specify lag in numpy.correlate

Tags:

python

numpy

Matlab's cross-correlation function xcorr(x,y,maxlags) has an option maxlag, which returns the cross-correlation sequence over the lag range [-maxlags:maxlags]. Numpy's numpy.correlate(N,M,mode) has three modes, but none of them allow me to set a specific lag, which is different from full (N+M-1), same (max(M, N)) or valid (max(M, N) - min(M, N) + 1 ). For len(N) = 60000, len (M) = 200, I want to set the lag as 100.

like image 856
Jyotika Avatar asked Feb 21 '12 17:02

Jyotika


2 Answers

matplotlib.xcorr has the maxlags param. It is actually a wrapper of the numpy.correlate, so there is no performance saving. Nevertheless it gives exactly the same result given by Matlab's cross-correlation function. Below I edited the code from maxplotlib so that it will return only the correlation. The reason is that if we use matplotlib.corr as it is, it will return the plot as well. The problem is, if we put complex data type as the arguments into it, we will get "casting complex to real datatype" warning when matplotlib tries to draw the plot.

<!-- language: python -->

import numpy as np
import matplotlib.pyplot as plt

def xcorr(x, y, maxlags=10):
    Nx = len(x)
    if Nx != len(y):
        raise ValueError('x and y must be equal length')

    c = np.correlate(x, y, mode=2)

    if maxlags is None:
        maxlags = Nx - 1

    if maxlags >= Nx or maxlags < 1:
        raise ValueError('maxlags must be None or strictly positive < %d' % Nx)

    c = c[Nx - 1 - maxlags:Nx + maxlags]

    return c
like image 176
Leonard AB Avatar answered Oct 05 '22 05:10

Leonard AB


This is my implementation of the lead-lag correlation, but it is limited to be 1-D and not guaranteed to be the best in terms of efficient. It uses the scipy.stats.pearsonr to the do the core computation, so also returned is the p value for the coefficient. Please modify to optimize based on this straw man.

def lagcorr(x,y,lag=None,verbose=True):
    '''Compute lead-lag correlations between 2 time series.

    <x>,<y>: 1-D time series.
    <lag>: lag option, could take different forms of <lag>:
          if 0 or None, compute ordinary correlation and p-value;
          if positive integer, compute lagged correlation with lag
          upto <lag>;
          if negative integer, compute lead correlation with lead
          upto <-lag>;
          if pass in an list or tuple or array of integers, compute 
          lead/lag correlations at different leads/lags.

    Note: when talking about lead/lag, uses <y> as a reference.
    Therefore positive lag means <x> lags <y> by <lag>, computation is
    done by shifting <x> to the left hand side by <lag> with respect to
    <y>.
    Similarly negative lag means <x> leads <y> by <lag>, computation is
    done by shifting <x> to the right hand side by <lag> with respect to
    <y>.

    Return <result>: a (n*2) array, with 1st column the correlation 
    coefficients, 2nd column correpsonding p values.

    Currently only works for 1-D arrays.
    '''

    import numpy
    from scipy.stats import pearsonr

    if len(x)!=len(y):
        raise('Input variables of different lengths.')

    #--------Unify types of <lag>-------------
    if numpy.isscalar(lag):
        if abs(lag)>=len(x):
            raise('Maximum lag equal or larger than array.')
        if lag<0:
            lag=-numpy.arange(abs(lag)+1)
        elif lag==0:
            lag=[0,]
        else:
            lag=numpy.arange(lag+1)    
    elif lag is None:
        lag=[0,]
    else:
        lag=numpy.asarray(lag)

    #-------Loop over lags---------------------
    result=[]
    if verbose:
        print '\n#<lagcorr>: Computing lagged-correlations at lags:',lag

    for ii in lag:
        if ii<0:
            result.append(pearsonr(x[:ii],y[-ii:]))
        elif ii==0:
            result.append(pearsonr(x,y))
        elif ii>0:
            result.append(pearsonr(x[ii:],y[:-ii]))

    result=numpy.asarray(result)

    return result
like image 24
Jason Avatar answered Oct 05 '22 04:10

Jason