I am working with two pandas Series with Timestamps as indices. One Series is a coarse model with a fixed frequency, the other one are data with no fixed frequency. I would like to subtract the model from the data and (linearly or spline) interpolate the values of the model.
Here is an example:
import numpy as np
import pandas as pd
# generate model with fixed freq
model = pd.Series(range(5),index=pd.date_range('2017-06-19T12:05:00', '2017-06-19T12:25:00', freq="5 min"))
# generate data and add more_data to make frequency irregular
data = pd.Series(np.arange(10)+0.3,index=pd.date_range('2017-06-19T12:06:00',
'2017-06-19T12:24:00', freq="2 min"))
more_data = pd.Series([-10, -20], index=[pd.Timestamp('2017-06-19T12:07:35'),
pd.Timestamp('2017-06-19T12:09:10')])
data = data.append(more_data).sort_index()
I tried
data - model.interpolate()[data.index]
but that only gives me non-NaN values where the timestamps of the model and the data overlap.
I understand that I could resample the data to fit the frequency of the model () but I do want to have the data minus the model at the original timestamps of the data.
So with the help of this answer I figured out the solution to my problem, only interpolating at the points actually needed:
First, generate a Series of NaNs with the timestamps of data:
na = pd.Series(None, data.index)
and combine this with the model:
combi = model.combine_first(na)
This Series can now be interpolated and subtracted from the data
(data - combi.interpolate(method='time'))[data.index]
or as a one-liner
(data - model.combine_first(pd.Series(None, data.index)).interpolate(method='time'))[data.index]
Idea:
You could find the gcd of the values in the index of data in nanoseconds, then resample the model to fit the frequency of the data.
Method:
Construct a gcd function for numpy arrays using a method found here, and feed it data.index.astype(np.int64):
divisor = np.ufunc.reduce(np.frompyfunc(math.gcd, 2, 1),
data.index.astype(np.int64))
divisor
Out[91]: 5000000000
Then resample model and proceed as before:
data - model.resample(str(divisor)+'ns').interpolate(method='time')[data.index]
Out[61]:
2017-06-19 12:06:00 0.100000
2017-06-19 12:07:35 -10.516667
2017-06-19 12:08:00 0.700000
2017-06-19 12:09:10 -20.833333
2017-06-19 12:10:00 1.300000
2017-06-19 12:12:00 1.900000
2017-06-19 12:14:00 2.500000
2017-06-19 12:16:00 3.100000
2017-06-19 12:18:00 3.700000
2017-06-19 12:20:00 4.300000
2017-06-19 12:22:00 4.900000
2017-06-19 12:24:00 5.500000
dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With