This is my first question on stackoverflow. Go easy on me!
I have two data sets acquired simultaneously by different acquisition systems with different sampling rates. One is very regular, and the other is not. I would like to create a single dataframe containing both data sets, using the regularly spaced timestamps (in seconds) as the reference for both. The irregularly sampled data should be interpolated on the regularly spaced timestamps.
Here's some toy data demonstrating what I'm trying to do:
import pandas as pd
import numpy as np
# evenly spaced times
t1 = np.array([0,0.5,1.0,1.5,2.0])
y1 = t1
# unevenly spaced times
t2 = np.array([0,0.34,1.01,1.4,1.6,1.7,2.01])
y2 = 3*t2
df1 = pd.DataFrame(data={'y1':y1,'t':t1})
df2 = pd.DataFrame(data={'y2':y2,'t':t2})
df1 and df2 look like this:
df1:
t y1
0 0.0 0.0
1 0.5 0.5
2 1.0 1.0
3 1.5 1.5
4 2.0 2.0
df2:
t y2
0 0.00 0.00
1 0.34 1.02
2 1.01 3.03
3 1.40 4.20
4 1.60 4.80
5 1.70 5.10
6 2.01 6.03
I'm trying to merge df1 and df2, interpolating y2 on df1.t. The desired result is:
df_combined:
t y1 y2
0 0.0 0.0 0.0
1 0.5 0.5 1.5
2 1.0 1.0 3.0
3 1.5 1.5 4.5
4 2.0 2.0 6.0
I've been reading documentation for pandas.resample, as well as searching previous stackoverflow questions, but haven't been able to find a solution to my particular problem. Any ideas? Seems like it should be easy.
UPDATE: I figured out one possible solution: interpolate the second series first, then append to the first data frame:
from scipy.interpolate import interp1d
f2 = interp1d(t2,y2,bounds_error=False)
df1['y2'] = f2(df1.t)
which gives:
df1:
t y1 y2
0 0.0 0.0 0.0
1 0.5 0.5 1.5
2 1.0 1.0 3.0
3 1.5 1.5 4.5
4 2.0 2.0 6.0
That works, but I'm still open to other solutions if there's a better way.
interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs) Parameters : method : {'linear', 'time', 'index', 'values', 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh', 'polynomial', 'spline', 'piecewise_polynomial', ' ...
The Fastest Ways As it turns out, join always tends to perform well, and merge will perform almost exactly the same given the syntax is optimal.
interpolate() function is basically used to fill NA values in the dataframe or series. But, this is a very powerful function to fill the missing values. It uses various interpolation technique to fill the missing values rather than hard-coding the value. axis : 0 fill column-by-column and 1 fill row-by-row.
If you construct a single DataFrame from Series, using time values as index, like this:
>>> t1 = np.array([0, 0.5, 1.0, 1.5, 2.0])
>>> y1 = pd.Series(t1, index=t1)
>>> t2 = np.array([0, 0.34, 1.01, 1.4, 1.6, 1.7, 2.01])
>>> y2 = pd.Series(3*t2, index=t2)
>>> df = pd.DataFrame({'y1': y1, 'y2': y2})
>>> df
y1 y2
0.00 0.0 0.00
0.34 NaN 1.02
0.50 0.5 NaN
1.00 1.0 NaN
1.01 NaN 3.03
1.40 NaN 4.20
1.50 1.5 NaN
1.60 NaN 4.80
1.70 NaN 5.10
2.00 2.0 NaN
2.01 NaN 6.03
You can simply interpolate it, and select only the part where y1
is defined:
>>> df.interpolate('index').reindex(y1)
y1 y2
0.0 0.0 0.0
0.5 0.5 1.5
1.0 1.0 3.0
1.5 1.5 4.5
2.0 2.0 6.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With