Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Autocorrelation of a multidimensional array in numpy

Tags:

python

numpy

I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).

I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?

Thanks!

EDIT:

Based on the chosen answer plus the comment from mtrw, I have the following function:

def xcorr(x):
  """FFT based autocorrelation function, which is faster than numpy.correlate"""
  # x is supposed to be an array of sequences, of shape (totalelements, length)
  fftx = fft(x, n=(length*2-1), axis=1)
  ret = ifft(fftx * np.conjugate(fftx), axis=1)
  ret = fftshift(ret, axes=1)
  return ret

Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.

like image 964
Christoph Avatar asked Dec 21 '10 19:12

Christoph


People also ask

Does NumPy support multidimensional arrays?

NumPy is a library in python adding support for large multidimensional arrays and matrices along with high level mathematical functions to operate these arrays.

What characterizes multidimensional NumPy arrays?

A NumPy array is a homogeneous block of data organized in a multidimensional finite grid. All elements of the array share the same data type, also called dtype (integer, floating-point number, and so on). The shape of the array is an n-tuple that gives the size of each axis.

What is NumPy correlate?

numpy.correlate() function defines the cross-correlation of two 1-dimensional sequences. This function computes the correlation as generally defined in signal processing texts: c_{av}[k] = sum_n a[n+k] * conj(v[n]) Syntax : numpy.correlate(a, v, mode = 'valid') Parameters : a, v : [array_like] Input sequences.


2 Answers

Using FFT-based autocorrelation:

import numpy
from numpy.fft import fft, ifft

data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0  1  2  3]
## [ 4  5  6  7]
## [ 8  9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[   14.     8.     6.     8.]
## [  126.   120.   118.   120.]
## [  366.   360.   358.   360.]
## [  734.   728.   726.   728.]
## [ 1230.  1224.  1222.  1224.]]

I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.

EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:

import numpy
from numpy.fft import fft, ifft

data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[  0.   1.   2.   3.   0.   0.   0.   0.]
## [  4.   5.   6.   7.   0.   0.   0.   0.]
## [  8.   9.  10.  11.   0.   0.   0.   0.]
## [ 12.  13.  14.  15.   0.   0.   0.   0.]
## [ 16.  17.  18.  19.   0.   0.   0.   0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[   14.     8.     3.     0.     0.     3.     8.]
## [  126.    92.    59.    28.    28.    59.    92.]
## [  366.   272.   179.    88.    88.   179.   272.]
## [  734.   548.   363.   180.   180.   363.   548.]
## [ 1230.   920.   611.   304.   304.   611.   920.]]

There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.

like image 77
Andrew Avatar answered Sep 20 '22 05:09

Andrew


For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:

def xcorr(x):
    l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
    fftx = fft(x, n = l, axis = 1)
    ret = ifft(fftx * np.conjugate(fftx), axis = 1)
    ret = fftshift(ret, axes=1)
    return ret

This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.

like image 26
Lasse Avatar answered Sep 18 '22 05:09

Lasse