Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use `Series.interpolate` in pandas with the old values modified

The interploate method in pandas use the valid data to interpolate the nan values. However, it keeps the old valid data unchanged as the following codes.

Is there any way to use interploate method with the old values changed such that the series become smooth?

In [1]: %matplotlib inline
In [2]: from scipy.interpolate import UnivariateSpline as spl
In [3]: import numpy as np
In [4]: import pandas as pd
In [5]: samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 }
In [6]: x, y = zip(*sorted(samples.items()))

In [7]: df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)

In [8]: df1.loc[x] = np.array(y)[:, None]
In [9]: df1['itp'].interpolate('spline', order=3, inplace=True)
In [10]: df1.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6))

enter image description here

In [11]: df2 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)
In [12]: df2.loc[x, 'raw'] = y
In [13]: f = spl(x, y, k=3)
In [14]: df2['itp'] = f(df2.index)
In [15]: df2.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6))

enter image description here

like image 991
Eastsun Avatar asked Aug 15 '15 09:08

Eastsun


People also ask

What are types of interpolation to fill missing values in series data?

We will now look at three different methods of interpolating the missing read values: forward-filling, backward-filling and interpolating.

How does Panda interpolate work?

interpolate() function is basically used to fill NA values in the dataframe or series. But, this is a very powerful function to fill the missing values. It uses various interpolation technique to fill the missing values rather than hard-coding the value. axis : 0 fill column-by-column and 1 fill row-by-row.


1 Answers

When you use Series.interpolate with method='spline', under the hood Pandas uses interpolate.UnivariateSpline.

The spline returned by UnivariateSpline is not guaranteed to pass through the data points given as input unless s=0. However, by default s=None, which uses a different smoothing factor and thus leads to a different result.

The Series.interpolate method always fills in NaN values without changing the non-NaN values. There is no way to make Series.interpolate modify the non-NaN values. So, when s != 0, the result produces jagged jumps.

So if you want the s=None (default) spline interpolation but without the jagged jumps, as you've already found, you have to call UnivariateSpline directly and overwrite all the values in df['itp']:

df['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df.index)

If you want a cubic spline that passes through all the non-NaN data points, then use s=0

df['itp'].interpolate('spline', order=3, s=0, inplace=True)

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.interpolate as interpolate

samples = { 0.0: 0.0, 0.4: 0.5, 0.5: 0.9, 0.6: 0.7, 0.8:0.3, 1.0: 1.0 }
x, y = zip(*sorted(samples.items()))

fig, ax = plt.subplots(nrows=3, sharex=True)
df1 = pd.DataFrame(index=np.linspace(0, 1, 31), columns=['raw', 'itp'], dtype=float)
df1.loc[x] = np.array(y)[:, None]

df2 = df1.copy()
df3 = df1.copy()

df1['itp'].interpolate('spline', order=3, inplace=True)
df2['itp'] = interpolate.UnivariateSpline(x, y, k=3)(df2.index)
df3['itp'].interpolate('spline', order=3, s=0, inplace=True)
for i, df in enumerate((df1, df2, df3)):
    df.plot(style={'itp': 'b-', 'raw': 'rs'}, figsize=(8, 6), ax=ax[i])
plt.show()

enter image description here

like image 78
unutbu Avatar answered Sep 17 '22 02:09

unutbu