I have one set of values measured at regular times. Say:
import pandas as pd
import numpy as np
rng = pd.date_range('2013-01-01', periods=12, freq='H')
data = pd.Series(np.random.randn(len(rng)), index=rng)
And another set of more arbitrary times, for example, (in reality these times are not a regular sequence)
ts_rng = pd.date_range('2013-01-01 01:11:21', periods=7, freq='87Min')
ts = pd.Series(index=ts_rng)
I want to know the value of data interpolated at the times in ts.
I can do this in numpy:
x = np.asarray(ts_rng,dtype=np.float64)
xp = np.asarray(data.index,dtype=np.float64)
fp = np.asarray(data)
ts[:] = np.interp(x,xp,fp)
But I feel pandas has this functionality somewhere in resample
, reindex
etc. but I can't quite get it.
Linear interpolation works the best when we have many points.
Interpolation is mostly used while working with time-series data because in time-series data we like to fill missing values with previous one or two values. for example, suppose temperature, now we would always prefer to fill today's temperature with the mean of the last 2 days, not with the mean of the month.
You can concatenate the two time series and sort by index. Since the values in the second series are NaN
you can interpolate
and the just select out the values that represent the points from the second series:
pd.concat([data, ts]).sort_index().interpolate().reindex(ts.index)
or
pd.concat([data, ts]).sort_index().interpolate()[ts.index]
Assume you would like to evaluate a time series ts on a different datetime_index. This index and the index of ts may overlap. I recommend to use the following groupby trick. This essentially gets rid of dubious double stamps. I then forward interpolate but feel free to apply more fancy methods
def interpolate(ts, datetime_index):
x = pd.concat([ts, pd.Series(index=datetime_index)])
return x.groupby(x.index).first().sort_index().fillna(method="ffill")[datetime_index]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With