Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cross-correlation of non-periodic function with NumPy

I have two data sets that I'm trying to cross-correlate. They look similar to the arctan function, so I've been using it as a model to work out how to do my signal processing.

x = linspace(-15, 15, 2**13)
f1 = arctan(x)
f2 = arctan(x + 2)

enter image description here

The question I need to answer is, how much do I need to shift the green signal to get it to (mostly) overlap with the blue one? I thought it would be as simple as finding the maximum in the cross-correlation function of f1 and f2, and I broadly followed the advice here: How to correlate two time series with gaps and different time bases?. This is what I've been trying

c = correlate(f1, f2, 'full')
s = arange(1-2**13, 2**13)
dx = 30/2**13
shift = s[c.argmax()]*dx

I would expect shift to equal more or less exactly 2, but in fact it's only 0.234. This doesn't make any sense to me; I've found the x-coordinate of the maximum in cross-correlation, which should be found where there is maximal overlap of the two signals.

Any ideas on how to calculate this quantity for this kind of function?

EDIT: I should add, for my real data, all of the values will be between zero and one

EDIT EDIT: The following functions are actually more like my real data:

x = linspace(-15, 15, 400)
f1 = (arctan(-x) + pi/2) / pi
f2 = (arctan(-x + 2) + pi/2) / pi

enter image description here

So using the formula given here: http://paulbourke.net/miscellaneous/correlate/ I can write a cross-correlation function that pads the data to add ones to the left and zeros to the right:

def xcorr(x, y);
    mx = x.mean()
    my = y.mean()
    sx = x.std()
    sy = y.std()
    r = zeros(2*len(x))

    for d in range(-len(x), len(x)):
        csum = 0
        for i in range(0, len(x):
            yindx = i - d
            if i - d < 0:
                yval = 1
            elif i - d >= len(x):
                yval = 0
            else:
                yval = y[yindx]
            csum += (x[i] - mx) * (yval - my)
        r[d + len(x)] = csum / (sx * sy)
    return r

With this function, I can now do

c = xcorr(f1, f2)
s = arange(-400, 400)
dx = 30/400
shift = s[c.argmax()] * dx

Which comes out to 2.025, which is as close as you can get to 2 with this precision. So it looks like Jamie was correct, the issue is in how numpy correlate does the padding of signals.

So, obviously my xcorr function is really slow as it stands. The question now is, is there a way I can make NumPy do a similar thing, or should I just proceed to write my algorithm in C using ctypes?

like image 461
KJ Tsanaktsidis Avatar asked Oct 04 '22 10:10

KJ Tsanaktsidis


2 Answers

As people have pointed out, the cross-correlation is confounded by the padding beyond the data.

Although it feels like you're throwing away good data, it's often better to just trim the data set so that the correlation can be done without assumptions (at least when compared to the alternative of correlating actual data with made up data for the padding).

x = linspace(-15, 15, 4000)
f1 = (arctan(-x) + pi/2) / pi
f2 = (arctan(-x + 2) + pi/2) / pi

L4 = int(len(f2)/8)
sf2 = f2[L4:-L4]

c = correlate(f1-mean(f1), sf2-mean(f1), 'same')
print "peak correlation occurs at:", x[argmax(c)]  # -2.02925731433

plt.plot(x, c)
plt.show()

enter image description here

Also, though, I'm not sure that xcorr is the best approach here. How about just summing the distance between the y-axis values for different shifts and taking the minimum, which gets away from all the issues of where the zero is, etc.

like image 186
tom10 Avatar answered Oct 10 '22 01:10

tom10


What the maximum of the cross-correlation finds is the shift at which the sum of the products of your two signals is a maximum. One would think that the cross correlation of two signals, one a time shift of the other, would be maximum at the shift. And while this is true for infinite signals, numpy zero-pads your two signals, so with two equally long signals, you only have a summation over 2**13 non-zero values for a shift of zero, and the higher values from a better matching of the two functions, are offset but there being less non-zero values.

If your signals were 0 at +/- infinity, this wouldn´t be too bad. As it is, I haven't been able to come up with any reasonable solution using cross-correlation.

like image 45
Jaime Avatar answered Oct 10 '22 03:10

Jaime