Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to interpolate only between values (stopping before and after last NaN in a column) with pandas?

If I have a df similar to this one:

                       A  B  C    D    E
2016-08-10 13:57:00  3.6  A  1  NaN  NaN
2016-08-10 13:58:00  4.7  A  1  4.5  NaN
2016-08-10 13:59:00  3.4  A  0  NaN  5.7
2016-08-10 14:00:00  3.5  A  0  NaN  NaN
2016-08-10 14:01:00  2.6  A  0  4.6  NaN
2016-08-10 14:02:00  4.8  A  0  NaN  4.3
2016-08-10 14:03:00  5.7  A  1  NaN  NaN
2016-08-10 14:04:00  5.5  A  1  5.7  NaN
2016-08-10 14:05:00  5.6  A  1  NaN  NaN
2016-08-10 14:06:00  7.8  A  1  NaN  5.2
2016-08-10 14:07:00  8.9  A  0  NaN  NaN
2016-08-10 14:08:00  3.6  A  0  NaN  NaN

print (df.dtypes)
A    float64
B     object
C      int64
D    float64
E    float64
dtype: object

Thanks to a lot of input from the community I have this code now which allows me to upsample my df to second intervals, applying different methods to different dtypes

int_cols = df.select_dtypes(['int64']).columns
index = pd.date_range(df.index[0], df.index[-1], freq="s")
df2 = df.reindex(index)

for col in df2:
if col == int_cols.all(): 
    df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:

I am looking for a way now, to only interpolate between my actual measurements. The interpolate function extends my last measurement until the end of the df:

                            A  B  C    D    E
2016-08-10 14:07:56  3.953333  A  0  5.7  5.2
2016-08-10 14:07:57  3.865000  A  0  5.7  5.2
2016-08-10 14:07:58  3.776667  A  0  5.7  5.2
2016-08-10 14:07:59  3.688333  A  0  5.7  5.2
2016-08-10 14:08:00  3.600000  A  0  5.7  5.2

But I would like to stop this when the last measurement took place (for example at 14:04:00 col['D'] and 14:06:00 col['D']) and leave the NaNs.

It tried adding a zero value for 'limit' and 'limit_direction' to 'both':

 for col in df2:
if col == int_cols.all(): 
    df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
    df2[col].interpolate(inplace=True,limit=0, limit_direction='both')

but this didn't change anything to the output. I than tried to incorporate the solution I found to this question: Pandas: interpolation where first and last data point in column is NaN into my code:

for col in df2:
if col == int_cols.all(): 
    df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
   df2[col].loc[df2[col].first_valid_index(): df2[col].last_valid_index()]=df2[col].loc[df2[col].first_valid_index(): df2[col].last_valid_index()].astype(float).interpolate(inplace=True)

...but that did not work and my float64 columns are purely NaNs now...Also, the way I tried to insert the code, I know it would only have affected the float columns. In an ideal solution I would hope to do the set this first_valid_index():.last_valid_index() selection also to the object and int64 columns. Can somebody help me? ..thank you

like image 659
vera Avatar asked Feb 06 '23 06:02


1 Answers

For pandas 0.23.0 is possible use parameter limit_area in interpolate:

df = pd.DataFrame({'A': [np.nan, 1.0, np.nan, np.nan, 4.0, np.nan, np.nan],
                   'B': [np.nan, np.nan, 0.0, np.nan, np.nan, 2.0, np.nan]},
                  columns=['A', 'B'], 
                  index=pd.date_range(start='2016-08-10 13:50:00', periods=7, freq='S'))
print (df)
                       A    B
2016-08-10 13:50:00  NaN  NaN
2016-08-10 13:50:01  1.0  NaN
2016-08-10 13:50:02  NaN  0.0
2016-08-10 13:50:03  NaN  NaN
2016-08-10 13:50:04  4.0  NaN
2016-08-10 13:50:05  NaN  2.0
2016-08-10 13:50:06  NaN  NaN

df = df.interpolate(limit_direction='both', limit_area='inside')
print (df)
                       A         B
2016-08-10 13:50:00  NaN       NaN
2016-08-10 13:50:01  1.0       NaN
2016-08-10 13:50:02  2.0  0.000000
2016-08-10 13:50:03  3.0  0.666667
2016-08-10 13:50:04  4.0  1.333333
2016-08-10 13:50:05  NaN  2.000000
2016-08-10 13:50:06  NaN       NaN
like image 83
jezrael Avatar answered Feb 08 '23 23:02
