I was wondering if someone could point me to an algorithm/technique that is used to compare time dependent signals. Ideally, this hypothetical algorithm would take in 2 signals as inputs and return a number that would be the percentage similarity between the signals (0 being that the 2 signals are statistically unrelated and 1 being that they are a perfect match).
Of course, I realize that there are problems with my request, namely that I'm not sure how to properly define 'similarity' in the context of comparing these 2 signals, so if someone could also point me in the right direction (as to what I should look up/know, etc.), I'd appreciate it as well.
The cross-correlation function is the classic signal processing solution. If you have access to Matlab, see the XCORR function. max(abs(xcorr(Signal1, Signal2, 'coeff')))
would give you specifically what you're looking for and an equivalent exists in Python as well.
Cross-correlation assumes that the "similarity" you're looking for is a measure of the linear relationship between the two signals. The definition for real-valued finite-length signals with time index n = 0..N-1
is:
C[g] = sum{m = 0..N-1} (x1[m] * x2[g+m])
g
runs from -N..N
(outside that range the product inside the sum is 0).
Although you asked for a number, the function is pretty interesting. The function domain g
is called the lag domain.
If x1
and x2
are related by a time shift, the cross-correlation function will have its peak at the lag corresponding to the shift. For instance, if you had x1 = sin[wn]
and x2 = sin[wn + phi]
, so two sine waves at the same frequency and different phase, the cross-correlation function would have its peak at the lag corresponding to the phase shift.
If x2
is a scaled version of x1
, the cross-correlation will scale also. You can normalize the function to a correlation coefficient by dividing by sqrt(sum(x1^2)*sum(x2^2))
, and bring it into 0..1
by taking an absolute value (that line of Matlab has these operations).
More generally, below is a summary of what cross-correlation is good/bad for.
Cross-correlation works well for determining if one signal is linearly related to another, that is ifx2(t) = sum{n = 0..K-1}(A_n * x1(t + phi_n))
where x1(t)
and x2(t)
are the signals in question, A_n
are scaling factors, and phi_n
are time shifts. The implications of this are:
(phi_n <> 0 for some n)
the cross-correlation function will be non-zero. (A_n <> 0 for some n)
the cross-correlation function will be non-zero. A_n
and phi_n
are non-zero for some number of n's) the cross-correlation function will be non-zero. Note that this is also a definition of a linear filter.To get more concrete, suppose x1
is a wideband random signal. Let x2=x1
. Now the normalized cross-correlation function will be exactly 1 at g=0, and near 0 everywhere else. Now let x2
be a (linearly) filtered version of x1
. The cross-correlation function will be non-zero near g=0
. The width of the non-zero part will depend on the bandwidth of the filter.
For the special case of x1
and x2
being periodic, the information on the phase-shift in the original part of the answer applies.
Where cross-correlation will not help is if the two signals are not linearly related. For instance, two periodic signals at different frequencies are not linearly related. Nor are two random signals drawn from a wideband random process at different times. Nor are two signals that are similar in shape but with different time indexing - this is like the unequal fundamental frequency case.
In all cases, normalizing the cross-correlation function and looking at the maximum value will tell you if the signals are potentially linearly related - if the number is low, like under 0.1, I would be comfortable declaring them unrelated. Higher than that and I'd look into it more carefully, graphing both the normalized and unnormalized cross-correlation functions and looking at the structure. A periodic cross-correlation implies both signals are periodic, and a cross-correlation function that is noticeably higher around g=0
implies one signal is a filtered version of the other.
You could try a Fast Fourier Transform (look up FFT in Wikipedia, there are open source libraries for performing conversions).
FFTs will transform your data from time domain (i.e. a pulse at 1s, 2s, 3s, 4s...) to data in frequency domain (i.e. a pulse each second).
Then you can compare frequencies and their relative strenghts more easily. It should be a step in the right direction for you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With