I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).
I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?
Thanks!
EDIT:
Based on the chosen answer plus the comment from mtrw, I have the following function:
def xcorr(x):
"""FFT based autocorrelation function, which is faster than numpy.correlate"""
# x is supposed to be an array of sequences, of shape (totalelements, length)
fftx = fft(x, n=(length*2-1), axis=1)
ret = ifft(fftx * np.conjugate(fftx), axis=1)
ret = fftshift(ret, axes=1)
return ret
Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.
NumPy is a library in python adding support for large multidimensional arrays and matrices along with high level mathematical functions to operate these arrays.
A NumPy array is a homogeneous block of data organized in a multidimensional finite grid. All elements of the array share the same data type, also called dtype (integer, floating-point number, and so on). The shape of the array is an n-tuple that gives the size of each axis.
numpy.correlate() function defines the cross-correlation of two 1-dimensional sequences. This function computes the correlation as generally defined in signal processing texts: c_{av}[k] = sum_n a[n+k] * conj(v[n]) Syntax : numpy.correlate(a, v, mode = 'valid') Parameters : a, v : [array_like] Input sequences.
Using FFT-based autocorrelation:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[ 14. 8. 6. 8.]
## [ 126. 120. 118. 120.]
## [ 366. 360. 358. 360.]
## [ 734. 728. 726. 728.]
## [ 1230. 1224. 1222. 1224.]]
I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.
EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:
import numpy
from numpy.fft import fft, ifft
data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[ 0. 1. 2. 3. 0. 0. 0. 0.]
## [ 4. 5. 6. 7. 0. 0. 0. 0.]
## [ 8. 9. 10. 11. 0. 0. 0. 0.]
## [ 12. 13. 14. 15. 0. 0. 0. 0.]
## [ 16. 17. 18. 19. 0. 0. 0. 0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[ 14. 8. 3. 0. 0. 3. 8.]
## [ 126. 92. 59. 28. 28. 59. 92.]
## [ 366. 272. 179. 88. 88. 179. 272.]
## [ 734. 548. 363. 180. 180. 363. 548.]
## [ 1230. 920. 611. 304. 304. 611. 920.]]
There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.
For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:
def xcorr(x):
l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
fftx = fft(x, n = l, axis = 1)
ret = ifft(fftx * np.conjugate(fftx), axis = 1)
ret = fftshift(ret, axes=1)
return ret
This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With