Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas interpolation replacing NaNs after the last data point, but not before the first data point

When using pandas interpolate() to fill NaN values like this:

In [1]: s = pandas.Series([np.nan, np.nan, 1, np.nan, 3, np.nan, np.nan])

In [2]: s.interpolate()
Out[2]: 
0   NaN
1   NaN
2     1
3     2
4     3
5     3
6     3
dtype: float64

In [3]: pandas.version.version
Out[3]: '0.16.2'

, why does pandas replace the values at index 5 and 6 with 3s, but leave the values at 0 and 1 as is?

Can I change this behavior? I'd like to leave NaN at index 5 and 6.

(Actually, I'd like it to do linearly extrapolate to fill all of 0, 1, 5, and 6, but that's kind of a different question. Bonus points if you answer it too!)

like image 945
foobarbecue Avatar asked Jul 10 '15 05:07

foobarbecue


People also ask

How do pandas interpolate missing values?

You can interpolate missing values ( NaN ) in pandas. DataFrame and Series with interpolate() . This article describes the following contents. Use dropna() and fillna() to remove missing values NaN or to fill them with a specific value.

How does Panda interpolate work?

interpolate() function is basically used to fill NA values in the dataframe or series. But, this is a very powerful function to fill the missing values. It uses various interpolation technique to fill the missing values rather than hard-coding the value.


2 Answers

Internally, interpolate method uses a 'limit' parameter which avoids the filling propagation more than a specific threshold.

>>>df=pd.DataFrame( [0, np.nan, np.nan, np.nan, np.nan,np.nan, 2] )
>>>df
df 
    0
0   0
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6   2
>>>df.interpolate(limit=2)
          0
0  0.000000
1  0.333333
2  0.666667
3       NaN
4       NaN
5       NaN
6  2.000000

By default, the limitation is applied in the forward direction. In the backward direction, there is a default limit which is set to zero. This is why your first steps are not filled by method. One can change the direction using the 'limit_direction' parameter.

df.interpolate(limit=2, limit_direction='backward')
          0
0  0.000000
1       NaN
2       NaN
3       NaN
4  1.333333
5  1.666667
6  2.000000

To fill the first steps and the last steps of your dataframe, you can should set a non-zero value for 'limit' and 'limit_direction' to 'both':

>>> df=pd.DataFrame( [ np.nan, np.nan, 0, np.nan, 2, np.nan,8,5,np.nan, np.nan] )
>>> df
    0
0 NaN
1 NaN
2   0
3 NaN
4   2
5 NaN
6   8
7   5
8 NaN
9 NaN
>>> df.interpolate(method='spline', order=1, limit=10, limit_direction='both')
          0
0 -3.807382
1 -2.083581
2  0.000000
3  1.364022
4  2.000000
5  4.811625
6  8.000000
7  5.000000
8  4.937632
9  4.138735

The subject has been discussed here

like image 137
JulienV Avatar answered Oct 14 '22 12:10

JulienV


This interpolate behaviour in pandas looks strange. You can use scipy.interpolate.interp1d instead to produce expected result. For linear extrapolation, a simple function can be written to do this task.

import pandas as pd
import numpy as np
import scipy as sp

s = pd.Series([np.nan, np.nan, 1, np.nan, 3, np.nan, np.nan])

# interpolate using scipy
# ===========================================
s_no_nan = s.dropna()
func = sp.interpolate.interp1d(s_no_nan.index.values, s_no_nan.values, kind='linear', bounds_error=False)
s_interpolated = pd.Series(func(s.index), index=s.index)

Out[107]: 
0   NaN
1   NaN
2     1
3     2
4     3
5   NaN
6   NaN
dtype: float64

# extrapolate using user-defined func
# ===========================================
def my_extrapolate_func(scipy_interpolate_func, new_x):
    x1, x2 = scipy_interpolate_func.x[0], scipy_interpolate_func.x[-1]
    y1, y2 = scipy_interpolate_func.y[0], scipy_interpolate_func.y[-1]
    slope = (y2 - y1) / (x2 - x1)
    return y1 + slope * (new_x - x1)

s_extrapolated = pd.Series(my_extrapolate_func(func, s.index.values), index=s.index)

Out[108]: 
0   -1
1    0
2    1
3    2
4    3
5    4
6    5
dtype: float64
like image 23
Jianxun Li Avatar answered Oct 14 '22 14:10

Jianxun Li