Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correlate two time series with gaps and different time bases?

I have two time series of 3D accelerometer data that have different time bases (clocks started at different times, with some very slight creep during the sampling time), as well as containing many gaps of different size (due to delays associated with writing to separate flash devices).

The accelerometers I'm using are the inexpensive GCDC X250-2. I'm running the accelerometers at their highest gain, so the data has a significant noise floor.

The time series each have about 2 million data points (over an hour at 512 samples/sec), and contain about 500 events of interest, where a typical event spans 100-150 samples (200-300 ms each). Many of these events are affected by data outages during flash writes.

So, the data isn't pristine, and isn't even very pretty. But my eyeball inspection shows it clearly contains the information I'm interested in. (I can post plots, if needed.)

The accelerometers are in similar environments but are only moderately coupled, meaning that I can tell by eye which events match from each accelerometer, but I have been unsuccessful so far doing so in software. Due to physical limitations, the devices are also mounted in different orientations, where the axes don't match, but they are as close to orthogonal as I could make them. So, for example, for 3-axis accelerometers A & B, +Ax maps to -By (up-down), +Az maps to -Bx (left-right), and +Ay maps to -Bz (front-back).

My initial goal is to correlate shock events on the vertical axis, though I would eventually like to a) automatically discover the axis mapping, b) correlate activity on the mapped aces, and c) extract behavior differences between the two accelerometers (such as twisting or flexing).

The nature of the times series data makes Python's numpy.correlate() unusable. I've also looked at R's Zoo package, but have made no headway with it. I've looked to different fields of signal analysis for help, but I've made no progress.

Anyone have any clues for what I can do, or approaches I should research?

Update 28 Feb 2011: Added some plots here showing examples of the data.

like image 899
BobC Avatar asked Feb 27 '11 01:02

BobC


People also ask

How do you find the correlation of a time series?

The serial correlation or autocorrelation of lag , , of a second order stationary time series is given by the autocovariance of the series normalised by the product of the spread. That is, ρ k = C k σ 2 . Note that ρ 0 = C 0 σ 2 = E [ ( x t − μ ) 2 ] σ 2 = σ 2 σ 2 = 1 .

What is cross-correlation in time series?

Cross correlation presents a technique for comparing two time series and finding objectively how they match up with each other, and in particular where the best match occurs. It can also reveal any periodicities in the data.

What is time lag correlation?

Time-lagged cross-correlation usually refers to the correlation between two time series shifted relatively in time. Time-lagged cross-correlations between time series have been studied and an analytic method has been widely applied in diverse fields [2], [3], [4], [5], [6], [7].


1 Answers

My interpretation of your question: Given two very long, noisy time series, find a shift of one that matches large 'bumps' in one signal to large bumps in the other signal.

My suggestion: interpolate the data so it's uniformly spaced, rectify and smooth the data (assuming the phase of the fast oscillations is uninteresting), and do a one-point-at-a-time cross correlation (assuming a small shift will line up the data).

import numpy
from scipy.ndimage import gaussian_filter
"""
sig1 and sig 2 are assumed to be large, 1D numpy arrays
sig1 is sampled at times t1, sig2 is sampled at times t2
t_start, t_end, is your desired sampling interval
t_len is your desired number of measurements
"""

t = numpy.linspace(t_start, t_end, t_len)
sig1 = numpy.interp(t, t1, sig1)
sig2 = numpy.interp(t, t2, sig2)
#Now sig1 and sig2 are sampled at the same points.

"""
Rectify and smooth, so 'peaks' will stand out.
This makes big assumptions about your data;
these assumptions seem true-ish based on your plots.
"""
sigma = 10 #Tune this parameter to get the right smoothing
sig1, sig2 = abs(sig1), abs(sig2)
sig1, sig2 = gaussian_filter(sig1, sigma), gaussian_filter(sig2, sigma)

"""
Now sig1 and sig2 should look smoothly varying, with humps at each 'event'.
Hopefully we can search a small range of shifts to find the maximum of the 
cross-correlation. This assumes your data are *nearly* lined up already.
"""
max_xc = 0
best_shift = 0
for shift in range(-10, 10): #Tune this search range
    xc = (numpy.roll(sig1, shift) * sig2).sum()
    if xc > max_xc:
        max_xc = xc
        best_shift = shift
print 'Best shift:', best_shift
"""
If best_shift is at the edges of your search range,
you should expand the search range.
"""
like image 175
Andrew Avatar answered Sep 29 '22 02:09

Andrew