The interploate
method in pandas
use the valid data to interpolate the nan
values. However, it keeps the old valid data unchanged as the following codes.
Is there any way to use interploate
method with the old values changed such that the series become smooth?
In [1]: %matplotlib inline
In [2]: from scipy.interpolate import UnivariateSpline as spl
In [3]: import numpy as np
In [4]: import pandas as pd
In [5]: samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 }
In [6]: x, y = zip(*sorted(samples.items()))
In [7]: df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)
In [8]: df1.loc[x] = np.array(y)[:, None]
In [9]: df1['itp'].interpolate('spline', order=3, inplace=True)
In [10]: df1.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6))
In [11]: df2 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)
In [12]: df2.loc[x, 'raw'] = y
In [13]: f = spl(x, y, k=3)
In [14]: df2['itp'] = f(df2.index)
In [15]: df2.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6))
We will now look at three different methods of interpolating the missing read values: forward-filling, backward-filling and interpolating.
interpolate() function is basically used to fill NA values in the dataframe or series. But, this is a very powerful function to fill the missing values. It uses various interpolation technique to fill the missing values rather than hard-coding the value. axis : 0 fill column-by-column and 1 fill row-by-row.
When you use Series.interpolate
with method='spline'
, under the hood Pandas uses
interpolate.UnivariateSpline
.
The spline returned by
UnivariateSpline
is not guaranteed to pass through the data points given as input unless
s=0
.
However, by default s=None
, which uses a different smoothing factor and thus leads to a different result.
The Series.interpolate
method always fills in NaN
values
without changing the non-NaN values. There is no way to make
Series.interpolate
modify the non-NaN values. So, when s != 0
, the result
produces jagged jumps.
So if you want the s=None
(default) spline interpolation but without the
jagged jumps, as you've already found, you have to call UnivariateSpline
directly and overwrite all the values in df['itp']
:
df['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df.index)
If you want a cubic spline that passes through all the non-NaN data points, then
use s=0
df['itp'].interpolate('spline', order=3, s=0, inplace=True)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.interpolate as interpolate
samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 }
x, y = zip(*sorted(samples.items()))
fig, ax = plt.subplots(nrows=3, sharex=True)
df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)
df1.loc[x] = np.array(y)[:, None]
df2 = df1.copy()
df3 = df1.copy()
df1['itp'].interpolate('spline', order=3, inplace=True)
df2['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df2.index)
df3['itp'].interpolate('spline', order=3, s=0, inplace=True)
for i, df in enumerate((df1, df2, df3)):
df.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6), ax=ax[i])
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With