Combine two Pandas dataframes, resample on one time column, interpolate

Tags:

pandas

This is my first question on stackoverflow. Go easy on me!

I have two data sets acquired simultaneously by different acquisition systems with different sampling rates. One is very regular, and the other is not. I would like to create a single dataframe containing both data sets, using the regularly spaced timestamps (in seconds) as the reference for both. The irregularly sampled data should be interpolated on the regularly spaced timestamps.

Here's some toy data demonstrating what I'm trying to do:

import pandas as pd
import numpy as np

# evenly spaced times
t1 = np.array([0,0.5,1.0,1.5,2.0])
y1 = t1

# unevenly spaced times
t2 = np.array([0,0.34,1.01,1.4,1.6,1.7,2.01])
y2 = 3*t2

df1 = pd.DataFrame(data={'y1':y1,'t':t1})
df2 = pd.DataFrame(data={'y2':y2,'t':t2})

df1 and df2 look like this:

df1:
    t   y1
0  0.0  0.0
1  0.5  0.5
2  1.0  1.0
3  1.5  1.5
4  2.0  2.0

df2:
    t    y2
0  0.00  0.00
1  0.34  1.02
2  1.01  3.03
3  1.40  4.20
4  1.60  4.80
5  1.70  5.10
6  2.01  6.03

I'm trying to merge df1 and df2, interpolating y2 on df1.t. The desired result is:

df_combined:
     t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

I've been reading documentation for pandas.resample, as well as searching previous stackoverflow questions, but haven't been able to find a solution to my particular problem. Any ideas? Seems like it should be easy.

UPDATE: I figured out one possible solution: interpolate the second series first, then append to the first data frame:

from scipy.interpolate import interp1d
f2 = interp1d(t2,y2,bounds_error=False)
df1['y2'] = f2(df1.t)

which gives:

df1:
    t   y1   y2
0  0.0  0.0  0.0
1  0.5  0.5  1.5
2  1.0  1.0  3.0
3  1.5  1.5  4.5
4  2.0  2.0  6.0

That works, but I'm still open to other solutions if there's a better way.

964

asked Oct 22 '14 21:10

ollerend

1 Answers

If you construct a single DataFrame from Series, using time values as index, like this:

>>> t1 = np.array([0, 0.5, 1.0, 1.5, 2.0])
>>> y1 = pd.Series(t1, index=t1)

>>> t2 = np.array([0, 0.34, 1.01, 1.4, 1.6, 1.7, 2.01])
>>> y2 = pd.Series(3*t2, index=t2)

>>> df = pd.DataFrame({'y1': y1, 'y2': y2})
>>> df
       y1    y2
0.00  0.0  0.00
0.34  NaN  1.02
0.50  0.5   NaN
1.00  1.0   NaN
1.01  NaN  3.03
1.40  NaN  4.20
1.50  1.5   NaN
1.60  NaN  4.80
1.70  NaN  5.10
2.00  2.0   NaN
2.01  NaN  6.03

You can simply interpolate it, and select only the part where y1 is defined:

>>> df.interpolate('index').reindex(y1)
      y1   y2
0.0  0.0  0.0
0.5  0.5  1.5
1.0  1.0  3.0
1.5  1.5  4.5
2.0  2.0  6.0

115

answered Sep 28 '22 09:09

K3---rnc

Related questions
                            
                                Merging dataframes on an index is more efficient in Pandas
                            
                                Subclassing Numpy Array - Propagate Attributes
                            
                                Decorator in Java
                            
                                django aggregation: sum then average
                            
                                Pip "Could not find a version that satisfies the requirement pygame" [duplicate]
                            
                                Copying a worksheet with xlwings and python
                            
                                signal.alarm replacement in Windows [Python]
                            
                                How do I directly mock a superclass with python mock?
                            
                                How to convert a pitch track from a melody extraction algorithm to a humming like audio signal
                            
                                Representing rings of algebraic integers
                            
                                Does the Python range generator generate all values or yields them progressively? [duplicate]
                            
                                How to override the model.Manager.create() method in Django?
                            
                                Why does Python optimize out "if 0", but not "if None"?
                            
                                Why/How does Pandas use square brackets with .loc and .iloc?
                            
                                Simple get/post request blocked in python 3 but not in python 2
                            
                                Does Python have something as robust as Ruby's rvm?
                            
                                PHP - Equivalent for Python PDB(Debugger) in PHP? [closed]
                            
                                Python - SSL - wrong version number
                            
                                Weird IPython ipdb behaviour
                            
                                How do I upgrade the SQLite version used by Python's SQLite3 module on Mac?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With