Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtract two Series with irregular and regular timestamps in pandas with interpolation

I am working with two pandas Series with Timestamps as indices. One Series is a coarse model with a fixed frequency, the other one are data with no fixed frequency. I would like to subtract the model from the data and (linearly or spline) interpolate the values of the model.

Here is an example:

import numpy as np
import pandas as pd


# generate model with fixed freq
model = pd.Series(range(5),index=pd.date_range('2017-06-19T12:05:00', '2017-06-19T12:25:00', freq="5 min"))

# generate data and add more_data to make frequency irregular
data = pd.Series(np.arange(10)+0.3,index=pd.date_range('2017-06-19T12:06:00', 
'2017-06-19T12:24:00', freq="2 min"))
more_data = pd.Series([-10, -20], index=[pd.Timestamp('2017-06-19T12:07:35'), 
pd.Timestamp('2017-06-19T12:09:10')])
data = data.append(more_data).sort_index()

I tried

data - model.interpolate()[data.index]

but that only gives me non-NaN values where the timestamps of the model and the data overlap.

I understand that I could resample the data to fit the frequency of the model () but I do want to have the data minus the model at the original timestamps of the data.

like image 342
frankundfrei Avatar asked Nov 14 '25 23:11

frankundfrei


2 Answers

So with the help of this answer I figured out the solution to my problem, only interpolating at the points actually needed:

First, generate a Series of NaNs with the timestamps of data:

na = pd.Series(None, data.index)

and combine this with the model:

combi = model.combine_first(na)

This Series can now be interpolated and subtracted from the data

(data - combi.interpolate(method='time'))[data.index]

or as a one-liner

(data - model.combine_first(pd.Series(None, data.index)).interpolate(method='time'))[data.index]
like image 121
frankundfrei Avatar answered Nov 17 '25 21:11

frankundfrei


Idea:

You could find the gcd of the values in the index of data in nanoseconds, then resample the model to fit the frequency of the data.

Method:

Construct a gcd function for numpy arrays using a method found here, and feed it data.index.astype(np.int64):

divisor = np.ufunc.reduce(np.frompyfunc(math.gcd, 2, 1),
                          data.index.astype(np.int64))
divisor
Out[91]: 5000000000

Then resample model and proceed as before:

data - model.resample(str(divisor)+'ns').interpolate(method='time')[data.index]
    Out[61]: 
2017-06-19 12:06:00     0.100000
2017-06-19 12:07:35   -10.516667
2017-06-19 12:08:00     0.700000
2017-06-19 12:09:10   -20.833333
2017-06-19 12:10:00     1.300000
2017-06-19 12:12:00     1.900000
2017-06-19 12:14:00     2.500000
2017-06-19 12:16:00     3.100000
2017-06-19 12:18:00     3.700000
2017-06-19 12:20:00     4.300000
2017-06-19 12:22:00     4.900000
2017-06-19 12:24:00     5.500000
dtype: float64
like image 41
EFT Avatar answered Nov 17 '25 22:11

EFT



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!