Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to limit cross correlation window width in Numpy?

I am learning numpy/scipy, coming from a MATLAB background. The xcorr function in Matlab has an optional argument "maxlag" that limits the lag range from –maxlag to maxlag. This is very useful if you are looking at the cross-correlation between two very long time series but are only interested in the correlation within a certain time range. The performance increases are enormous considering that cross-correlation is incredibly expensive to compute.

In numpy/scipy it seems there are several options for computing cross-correlation. numpy.correlate, numpy.convolve, scipy.signal.fftconvolve. If someone wishes to explain the difference between these, I'd be happy to hear, but mainly what is troubling me is that none of them have a maxlag feature. This means that even if I only want to see correlations between two time series with lags between -100 and +100 ms, for example, it will still calculate the correlation for every lag between -20000 and +20000 ms (which is the length of the time series). This gives a 200x performance hit! Do I have to recode the cross-correlation function by hand to include this feature?

like image 203
honi Avatar asked Jun 05 '15 23:06

honi


3 Answers

Here are a couple functions to compute auto- and cross-correlation with limited lags. The order of multiplication (and conjugation, in the complex case) was chosen to match the corresponding behavior of numpy.correlate.

import numpy as np
from numpy.lib.stride_tricks import as_strided


def _check_arg(x, xname):
    x = np.asarray(x)
    if x.ndim != 1:
        raise ValueError('%s must be one-dimensional.' % xname)
    return x

def autocorrelation(x, maxlag):
    """
    Autocorrelation with a maximum number of lags.

    `x` must be a one-dimensional numpy array.

    This computes the same result as
        numpy.correlate(x, x, mode='full')[len(x)-1:len(x)+maxlag]

    The return value has length maxlag + 1.
    """
    x = _check_arg(x, 'x')
    p = np.pad(x.conj(), maxlag, mode='constant')
    T = as_strided(p[maxlag:], shape=(maxlag+1, len(x) + maxlag),
                   strides=(-p.strides[0], p.strides[0]))
    return T.dot(p[maxlag:].conj())


def crosscorrelation(x, y, maxlag):
    """
    Cross correlation with a maximum number of lags.

    `x` and `y` must be one-dimensional numpy arrays with the same length.

    This computes the same result as
        numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]

    The return vaue has length 2*maxlag + 1.
    """
    x = _check_arg(x, 'x')
    y = _check_arg(y, 'y')
    py = np.pad(y.conj(), 2*maxlag, mode='constant')
    T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
                   strides=(-py.strides[0], py.strides[0]))
    px = np.pad(x, maxlag, mode='constant')
    return T.dot(px)

For example,

In [367]: x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])

In [368]: autocorrelation(x, 3)
Out[368]: array([ 20.5,   5. ,  -3.5,  -1. ])

In [369]: np.correlate(x, x, mode='full')[7:11]
Out[369]: array([ 20.5,   5. ,  -3.5,  -1. ])

In [370]: y = np.arange(8)

In [371]: crosscorrelation(x, y, 3)
Out[371]: array([  5. ,  23.5,  32. ,  21. ,  16. ,  12.5,   9. ])

In [372]: np.correlate(x, y, mode='full')[4:11]
Out[372]: array([  5. ,  23.5,  32. ,  21. ,  16. ,  12.5,   9. ])

(It will be nice to have such a feature in numpy itself.)

like image 134
Warren Weckesser Avatar answered Oct 07 '22 15:10

Warren Weckesser


Until numpy implements the maxlag argument, you can use the function ucorrelate from the pycorrelate package. ucorrelate operates on numpy arrays and has a maxlag keyword. It implements the correlation from using a for-loop and optimizes the execution speed with numba.

Example - autocorrelation with 3 time lags:

import numpy as np
import pycorrelate as pyc

x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
c = pyc.ucorrelate(x, x, maxlag=3)
c

Result:

Out[1]: array([20,  5, -3])

The pycorrelate documentation contains a notebook showing perfect match between pycorrelate.ucorrelate and numpy.correlate:

enter image description here

like image 30
user2304916 Avatar answered Oct 07 '22 16:10

user2304916


matplotlib.pyplot provides matlab like syntax for computating and plotting of cross correlation , auto correlation etc.

You can use xcorr which allows to define the maxlags parameter.

    import matplotlib.pyplot as plt


    import numpy  as np


    data = np.arange(0,2*np.pi,0.01)


    y1 = np.sin(data)


    y2 = np.cos(data)


    coeff = plt.xcorr(y1,y2,maxlags=10)

    print(*coeff)


[-10  -9  -8  -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
   8   9  10] [ -9.81991753e-02  -8.85505028e-02  -7.88613080e-02  -6.91325329e-02
  -5.93651264e-02  -4.95600447e-02  -3.97182508e-02  -2.98407146e-02
  -1.99284126e-02  -9.98232812e-03  -3.45104289e-06   9.98555430e-03
   1.99417667e-02   2.98641953e-02   3.97518558e-02   4.96037706e-02
   5.94189688e-02   6.91964864e-02   7.89353663e-02   8.86346584e-02
   9.82934198e-02] <matplotlib.collections.LineCollection object at 0x00000000074A9E80> Line2D(_line0)
like image 3
Siva-Sg Avatar answered Oct 07 '22 15:10

Siva-Sg