I have two data sets that I'm trying to cross-correlate. They look similar to the arctan
function, so I've been using it as a model to work out how to do my signal processing.
x = linspace(-15, 15, 2**13)
f1 = arctan(x)
f2 = arctan(x + 2)
The question I need to answer is, how much do I need to shift the green signal to get it to (mostly) overlap with the blue one? I thought it would be as simple as finding the maximum in the cross-correlation function of f1
and f2
, and I broadly followed the advice here: How to correlate two time series with gaps and different time bases?. This is what I've been trying
c = correlate(f1, f2, 'full')
s = arange(1-2**13, 2**13)
dx = 30/2**13
shift = s[c.argmax()]*dx
I would expect shift
to equal more or less exactly 2, but in fact it's only 0.234
. This doesn't make any sense to me; I've found the x-coordinate of the maximum in cross-correlation, which should be found where there is maximal overlap of the two signals.
Any ideas on how to calculate this quantity for this kind of function?
EDIT: I should add, for my real data, all of the values will be between zero and one
EDIT EDIT: The following functions are actually more like my real data:
x = linspace(-15, 15, 400)
f1 = (arctan(-x) + pi/2) / pi
f2 = (arctan(-x + 2) + pi/2) / pi
So using the formula given here: http://paulbourke.net/miscellaneous/correlate/ I can write a cross-correlation function that pads the data to add ones to the left and zeros to the right:
def xcorr(x, y);
mx = x.mean()
my = y.mean()
sx = x.std()
sy = y.std()
r = zeros(2*len(x))
for d in range(-len(x), len(x)):
csum = 0
for i in range(0, len(x):
yindx = i - d
if i - d < 0:
yval = 1
elif i - d >= len(x):
yval = 0
else:
yval = y[yindx]
csum += (x[i] - mx) * (yval - my)
r[d + len(x)] = csum / (sx * sy)
return r
With this function, I can now do
c = xcorr(f1, f2)
s = arange(-400, 400)
dx = 30/400
shift = s[c.argmax()] * dx
Which comes out to 2.025, which is as close as you can get to 2 with this precision. So it looks like Jamie was correct, the issue is in how numpy correlate
does the padding of signals.
So, obviously my xcorr
function is really slow as it stands. The question now is, is there a way I can make NumPy do a similar thing, or should I just proceed to write my algorithm in C using ctypes
?
As people have pointed out, the cross-correlation is confounded by the padding beyond the data.
Although it feels like you're throwing away good data, it's often better to just trim the data set so that the correlation can be done without assumptions (at least when compared to the alternative of correlating actual data with made up data for the padding).
x = linspace(-15, 15, 4000)
f1 = (arctan(-x) + pi/2) / pi
f2 = (arctan(-x + 2) + pi/2) / pi
L4 = int(len(f2)/8)
sf2 = f2[L4:-L4]
c = correlate(f1-mean(f1), sf2-mean(f1), 'same')
print "peak correlation occurs at:", x[argmax(c)] # -2.02925731433
plt.plot(x, c)
plt.show()
Also, though, I'm not sure that xcorr is the best approach here. How about just summing the distance between the y-axis values for different shifts and taking the minimum, which gets away from all the issues of where the zero is, etc.
What the maximum of the cross-correlation finds is the shift at which the sum of the products of your two signals is a maximum. One would think that the cross correlation of two signals, one a time shift of the other, would be maximum at the shift. And while this is true for infinite signals, numpy zero-pads your two signals, so with two equally long signals, you only have a summation over 2**13
non-zero values for a shift of zero, and the higher values from a better matching of the two functions, are offset but there being less non-zero values.
If your signals were 0 at +/- infinity, this wouldn´t be too bad. As it is, I haven't been able to come up with any reasonable solution using cross-correlation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With