Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of pandas.shift() to align datasets based on scipy.signal.correlate

I have datasets that look like the following: data0, data1, data2 (analogous to time versus voltage data)

If I load and plot the datasets using code like:

import pandas as pd
import numpy as np
from scipy import signal
from matplotlib import pylab as plt

data0 = pd.read_csv('data0.csv')
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')

plt.plot(data0.x, data0.y, data1.x, data1.y, data2.x, data2.y)

I get something like:

plotting all three datasets

now I try to correlate data0 with data1:

shft01 = np.argmax(signal.correlate(data0.y, data1.y)) - len(data1.y)
print shft01
plt.figure()
plt.plot(data0.x, data0.y,
         data1.x.shift(-shft01), data1.y)
fig = plt.gcf()

with output:

-99

and

shifted version of data1

which works just as expected! but if I try it the same thing with data2, I get a plot that looks like:

shifted version of data2

with a positive shift of 410. I think I am just not understanding how pd.shift() works, but I was hoping that I could use pd.shift() to align my data sets. As far as I understand, the return from correlate() tells me how far off my data sets are, so I should be able to use shift to overlap them.

like image 503
not link Avatar asked Oct 28 '13 18:10

not link


2 Answers

panda.shift() is not the correct method to shift curve along x-axis. You should adjust X values of the points:

plt.plot(data0.x, data0.y)
for target in [data1, data2]:
    dx = np.mean(np.diff(data0.x.values))
    shift = (np.argmax(signal.correlate(data0.y, target.y)) - len(target.y)) * dx
    plt.plot(target.x + shift, target.y)

here is the output:

enter image description here

like image 169
HYRY Avatar answered Nov 17 '22 07:11

HYRY


@HYRY one correction to your answer: there is an indexing mismatch between len(), which is one-based, and np.argmax(), which is zero-based. The line should read:

shift = (np.argmax(signal.correlate(data0.y, target.y)) - (len(target.y)-1)) * dx

For example, in the case where your signals are already aligned:

len(target.y) = N (one-based)

The cross-correlation function has length 2N-1, so the center value, for aligned data, is:

np.argmax(signal.correlate(data0.y, target.y) = N - 1 (zero-based)

shift = ((N-1) - N) * dx = (-1) * dx, when we really want 0 * dx

like image 4
AhabTheArab Avatar answered Nov 17 '22 07:11

AhabTheArab