I have a question on xcorr in Python. Say that I do the following:
output=plt.xcorr(x,y, maxlags=4)
Which time-series is lagged? The output will be the cross-correlation between x and y at time t= -4 to +4. So is the output referring to the cross-correlation between x and y as follow?:
or it is the reverse between x and y?
I tried to dig into the code of xcorr to get a better idea (see here) but I am bit lost ... np.correlate(x,y,mode = 2). What does mode = 2 means? I only see here the mode being = valid
, full
, or same
.
The mode
parameter determines what happens near the boundaries. If you have input vectors with length x and y (x > y):
valid
/ 0: you will only receive the portion of the convolution where both signals overlap (x-y+1 points)same
/ 1: the length of the output vector is the same as the length of the longer input vector (x points)full
/ 2: all data from the area where there is even a single sample of overlap between the signals (x+y-1 points)The numbers for these modes are not very publicly defined, byt they can be found in numpy
's source code. In any case xcorr
uses the full
mode. (Actually, only the first letters of mode names matter when giving the mode for convolve
or correlate
.)
There is some confusion as to what these functions really do. numpy.correlate
has two different behaviours depending on numpy
version. Internally these are known as multiarray.correlate
(old) and multiarray.correlate2
(new). numpy.convolve
reverses the second input vector and uses then multiarray.correlate
(i.e. the one deprecated for correlation).
So, if you want to be really sure, you test what happens. The basic function is the product between two vectors where the vectors are moved one position at a time. To clarify this, I'll use some numeric examples with two vectors.
a <= [1,2,3,4,5]
b <= [10,20]
let's first look at convolve:
numpy.convolve(a,b,mode='full') => [ 10, 40, 70, 100, 230, 100]
this is because:
1 2 3 4 5 => 1 x 10 = 10
20 10
1 2 3 4 5 => 1 x 20 + 2 x 10 = 40
20 10
...
1 2 3 4 5 => 5 x 20 = 100
20 10
Different modes return the same data but truncated at each end.
For correlation:
numpy.correlate(a,b,mode='full') => [ 20, 50, 80, 110, 140, 50]
1 2 3 4 5 => 1 x 20 = 20
10 20
1 2 3 4 5 => 1 x 10 + 2 x 20 = 50
10 20
...
1 2 3 4 5 => 5 x 10 = 100
10 20
So, basically the only difference with real numbers is that one of the vectors is mirrored. This has some consequences, such as convolution giving the same result if a
and b
is swapped, correlation giving reversed result in that case. With complex numbers correlate
conjugates the second vector prior to the calculations above.
Back to matplotlib
's xcorr
graph. It receives two vectors x
and y
with equal lengths and calculates the cross-correlation of these vectors at different lags.
It first calculates the full convolution with numpy.correlate
between x
and y
as shown above. Then it draws the correlation results from the full output vector at positions -maxlags
..maxlags
. The rule is that the second input vector is shifted. At the leftmost graph position the second vector y
is at its leftmost position (i.e. shifted to the left from x
).
The easiest way to check this may be:
xcorr([1.,2.,3.,4.,5.], [0,0,0,0,1.], normed=False, maxlags=4)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With