I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7). I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way? Thanks! EDIT: Based on the chosen answer plus the comment from mtrw, I have the following function: <pre class="prettyprint"><code>def xcorr(x): """FFT based autocorrelation function, which is faster than numpy.correlate""" # x is supposed to be an array of sequences, of shape (totalelements, length) fftx = fft(x, n=(length*2-1), axis=1) ret = ifft(fftx * np.conjugate(fftx), axis=1) ret = fftshift(ret, axes=1) return ret </code></pre> Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.

Using FFT-based autocorrelation: <pre class="prettyprint"><code>import numpy from numpy.fft import fft, ifft data = numpy.arange(5*4).reshape(5, 4) print data ##[[ 0 1 2 3] ## [ 4 5 6 7] ## [ 8 9 10 11] ## [12 13 14 15] ## [16 17 18 19]] dataFT = fft(data, axis=1) dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real print dataAC ##[[ 14. 8. 6. 8.] ## [ 126. 120. 118. 120.] ## [ 366. 360. 358. 360.] ## [ 734. 728. 726. 728.] ## [ 1230. 1224. 1222. 1224.]] </code></pre> I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding. EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around: <pre class="prettyprint"><code>import numpy from numpy.fft import fft, ifft data = numpy.arange(5*4).reshape(5, 4) padding = numpy.zeros((5, 3)) dataPadded = numpy.concatenate((data, padding), axis=1) print dataPadded ##[[ 0. 1. 2. 3. 0. 0. 0. 0.] ## [ 4. 5. 6. 7. 0. 0. 0. 0.] ## [ 8. 9. 10. 11. 0. 0. 0. 0.] ## [ 12. 13. 14. 15. 0. 0. 0. 0.] ## [ 16. 17. 18. 19. 0. 0. 0. 0.]] dataFT = fft(dataPadded, axis=1) dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real print numpy.round(dataAC, 10)[:, :4] ##[[ 14. 8. 3. 0. 0. 3. 8.] ## [ 126. 92. 59. 28. 28. 59. 92.] ## [ 366. 272. 179. 88. 88. 179. 272.] ## [ 734. 548. 363. 180. 180. 363. 548.] ## [ 1230. 920. 611. 304. 304. 611. 920.]] </code></pre> There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.

Autocorrelation of a multidimensional array in numpy

Tags:

python

numpy

I have a two dimensional array, i.e. an array of sequences which are also arrays. For each sequence I would like to calculate the autocorrelation, so that for a (5,4) array, I would get 5 results, or an array of dimension (5,7).

I know I could just loop over the first dimension, but that's slow and my last resort. Is there another way?

Thanks!

EDIT:

Based on the chosen answer plus the comment from mtrw, I have the following function:

def xcorr(x):
  """FFT based autocorrelation function, which is faster than numpy.correlate"""
  # x is supposed to be an array of sequences, of shape (totalelements, length)
  fftx = fft(x, n=(length*2-1), axis=1)
  ret = ifft(fftx * np.conjugate(fftx), axis=1)
  ret = fftshift(ret, axes=1)
  return ret

Note that length is a global variable in my code, so be sure to declare it. I also didn't restrict the result to real numbers, since I need to take into account complex numbers as well.

964

asked Dec 21 '10 19:12

Christoph

2 Answers

Using FFT-based autocorrelation:

import numpy
from numpy.fft import fft, ifft

data = numpy.arange(5*4).reshape(5, 4)
print data
##[[ 0  1  2  3]
## [ 4  5  6  7]
## [ 8  9 10 11]
## [12 13 14 15]
## [16 17 18 19]]
dataFT = fft(data, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print dataAC
##[[   14.     8.     6.     8.]
## [  126.   120.   118.   120.]
## [  366.   360.   358.   360.]
## [  734.   728.   726.   728.]
## [ 1230.  1224.  1222.  1224.]]

I'm a little confused by your statement about the answer having dimension (5, 7), so maybe there's something important I'm not understanding.

EDIT: At the suggestion of mtrw, a padded version that doesn't wrap around:

import numpy
from numpy.fft import fft, ifft

data = numpy.arange(5*4).reshape(5, 4)
padding = numpy.zeros((5, 3))
dataPadded = numpy.concatenate((data, padding), axis=1)
print dataPadded
##[[  0.   1.   2.   3.   0.   0.   0.   0.]
## [  4.   5.   6.   7.   0.   0.   0.   0.]
## [  8.   9.  10.  11.   0.   0.   0.   0.]
## [ 12.  13.  14.  15.   0.   0.   0.   0.]
## [ 16.  17.  18.  19.   0.   0.   0.   0.]]
dataFT = fft(dataPadded, axis=1)
dataAC = ifft(dataFT * numpy.conjugate(dataFT), axis=1).real
print numpy.round(dataAC, 10)[:, :4]
##[[   14.     8.     3.     0.     0.     3.     8.]
## [  126.    92.    59.    28.    28.    59.    92.]
## [  366.   272.   179.    88.    88.   179.   272.]
## [  734.   548.   363.   180.   180.   363.   548.]
## [ 1230.   920.   611.   304.   304.   611.   920.]]

There must be a more efficient way to do this, especially because autocorrelation is symmetric and I don't take advantage of that.

answered Sep 20 '22 05:09

Andrew

For really large arrays it becomes important to have n = 2 ** p, where p is an integer. This will save you huge amounts of time. For example:

def xcorr(x):
    l = 2 ** int(np.log2(x.shape[1] * 2 - 1))
    fftx = fft(x, n = l, axis = 1)
    ret = ifft(fftx * np.conjugate(fftx), axis = 1)
    ret = fftshift(ret, axes=1)
    return ret

This might give you wrap-around errors. For large arrays the auto correlation should be insignificant near the edges, though.

answered Sep 18 '22 05:09

Lasse

Related questions
                            
                                Pandas df.at() raising AttributeError: 'BlockManager' object has no attribute 'T'
                            
                                Visual Studio Code> Python > Black formatting does not run on save
                            
                                TypeError: object.__init__() takes exactly one argument (the instance to initialize)
                            
                                How to get Conda and Virtualenv to work on mac OS Catalina?
                            
                                What does this error mean: "TypeError: Parameters to generic types must be types"?
                            
                                Binary wheel can't be uploaded on pypi using twine
                            
                                How to invoke Cloud Function from Cloud Scheduler with Authentication
                            
                                Python embedded in CPP: how to get data back to CPP
                            
                                Unicode block of a character in python
                            
                                Generating a WSDL using Python and SOAPpy
                            
                                Discussion of multiple inheritance vs Composition for a project (+other things)
                            
                                random.choice not random
                            
                                Light-weight renderer HTML with CSS in Python
                            
                                Sankey diagrams in Python
                            
                                working with negative numbers in python
                            
                                itertools.islice compared to list slice
                            
                                Why subtract a value from itself (x - x) in Python?
                            
                                How to backup an AppEngine site?
                            
                                How to change language from Django URL?
                            
                                How to write stereo wav files in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With