Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpolating time series in Pandas using Cubic spline

Tags:

python

pandas

I would like to fill gaps in a column in my DataFrame using a cubic spline. If I were to export to a list then I could use the numpy's interp1d function and apply this to the missing values.

Is there a way to use this function inside pandas?

like image 366
user1911866 Avatar asked Dec 18 '12 09:12

user1911866


People also ask

How to interpolate time series data in Python pandas?

How to Interpolate Time Series Data in Python Pandas Preparing the Data and Initial Visualization. First, we generate a pandas data frame df0 with some test data. We create... Interpolation. To interpolate the data, we can make use of the groupby ()- function followed by resample (). ... Since... ...

How to perform cubic spline interpolation in Python?

In Python, we can use scipy’s function CubicSpline to perform cubic spline interpolation.

How do you find the interpolating function of a spline?

In cubic spline interpolation (as shown in the following figure), the interpolating function is a set of piecewise cubic functions. Specifically, we assume that the points ( x i, y i) and ( x i + 1, y i + 1) are joined by a cubic polynomial S i ( x) = a i x 3 + b i x 2 + c i x + d i that is valid for x i ≤ x ≤ x i + 1 for i = 1, …, n − 1.

What does ‘time’ and ‘index’ mean in SciPy interpolate?

‘time’: Works on daily and higher resolution data to interpolate given length of interval. ‘index’, ‘values’: use the actual numerical values of the index. ‘pad’: Fill in NaNs using existing values. ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d.


1 Answers

Most numpy/scipy function require the arguments only to be "array_like", iterp1d is no exception. Fortunately both Series and DataFrame are "array_like" so we don't need to leave pandas:

import pandas as pd
import numpy as np
from scipy.interpolate import interp1d

df = pd.DataFrame([np.arange(1, 6), [1, 8, 27, np.nan, 125]]).T

In [5]: df
Out[5]: 
   0    1
0  1    1
1  2    8
2  3   27
3  4  NaN
4  5  125

df2 = df.dropna() # interpolate on the non nan
f = interp1d(df2[0], df2[1], kind='cubic')
#f(4) == array(63.9999999999992)

df[1] = df[0].apply(f)

In [10]: df
Out[10]: 
   0    1
0  1    1
1  2    8
2  3   27
3  4   64
4  5  125

Note: I couldn't think of an example off the top of my head to pass in a DataFrame into the second argument (y)... but this ought to work too.

like image 178
Andy Hayden Avatar answered Sep 20 '22 22:09

Andy Hayden