If I have a df
similar to this one:
print(df)
A B C D E
DATE_TIME
2016-08-10 13:57:00 3.6 A 1 NaN NaN
2016-08-10 13:58:00 4.7 A 1 4.5 NaN
2016-08-10 13:59:00 3.4 A 0 NaN 5.7
2016-08-10 14:00:00 3.5 A 0 NaN NaN
2016-08-10 14:01:00 2.6 A 0 4.6 NaN
2016-08-10 14:02:00 4.8 A 0 NaN 4.3
2016-08-10 14:03:00 5.7 A 1 NaN NaN
2016-08-10 14:04:00 5.5 A 1 5.7 NaN
2016-08-10 14:05:00 5.6 A 1 NaN NaN
2016-08-10 14:06:00 7.8 A 1 NaN 5.2
2016-08-10 14:07:00 8.9 A 0 NaN NaN
2016-08-10 14:08:00 3.6 A 0 NaN NaN
print (df.dtypes)
A float64
B object
C int64
D float64
E float64
dtype: object
Thanks to a lot of input from the community I have this code now which allows me to upsample my df to second intervals, applying different methods to different dtypes
int_cols = df.select_dtypes(['int64']).columns
index = pd.date_range(df.index[0], df.index[-1], freq="s")
df2 = df.reindex(index)
for col in df2:
if col == int_cols.all():
df2[col].ffill(inplace=True)
df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
df2[col].interpolate(inplace=True)
else:
df2[col].ffill(inplace=True)
I am looking for a way now, to only interpolate between my actual measurements. The interpolate function extends my last measurement until the end of the df
:
df2.tail()
Out[75]:
A B C D E
2016-08-10 14:07:56 3.953333 A 0 5.7 5.2
2016-08-10 14:07:57 3.865000 A 0 5.7 5.2
2016-08-10 14:07:58 3.776667 A 0 5.7 5.2
2016-08-10 14:07:59 3.688333 A 0 5.7 5.2
2016-08-10 14:08:00 3.600000 A 0 5.7 5.2
But I would like to stop this when the last measurement took place (for example at 14:04:00 col['D']
and 14:06:00 col['D']
) and leave the NaNs.
It tried adding a zero value for 'limit' and 'limit_direction' to 'both':
for col in df2:
if col == int_cols.all():
df2[col].ffill(inplace=True)
df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
df2[col].interpolate(inplace=True,limit=0, limit_direction='both')
else:
df2[col].ffill(inplace=True)
but this didn't change anything to the output. I than tried to incorporate the solution I found to this question: Pandas: interpolation where first and last data point in column is NaN into my code:
for col in df2:
if col == int_cols.all():
df2[col].ffill(inplace=True)
df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
df2[col].loc[df2[col].first_valid_index(): df2[col].last_valid_index()]=df2[col].loc[df2[col].first_valid_index(): df2[col].last_valid_index()].astype(float).interpolate(inplace=True)
else:
df2[col].ffill(inplace=True)
...but that did not work and my float64
columns are purely NaNs now...Also, the way I tried to insert the code, I know it would only have affected the float
columns. In an ideal solution I would hope to do the set this first_valid_index():.last_valid_index()
selection also to the object
and int64
columns. Can somebody help me? ..thank you
For pandas 0.23.0
is possible use parameter limit_area
in interpolate
:
df = pd.DataFrame({'A': [np.nan, 1.0, np.nan, np.nan, 4.0, np.nan, np.nan],
'B': [np.nan, np.nan, 0.0, np.nan, np.nan, 2.0, np.nan]},
columns=['A', 'B'],
index=pd.date_range(start='2016-08-10 13:50:00', periods=7, freq='S'))
print (df)
A B
2016-08-10 13:50:00 NaN NaN
2016-08-10 13:50:01 1.0 NaN
2016-08-10 13:50:02 NaN 0.0
2016-08-10 13:50:03 NaN NaN
2016-08-10 13:50:04 4.0 NaN
2016-08-10 13:50:05 NaN 2.0
2016-08-10 13:50:06 NaN NaN
df = df.interpolate(limit_direction='both', limit_area='inside')
print (df)
A B
2016-08-10 13:50:00 NaN NaN
2016-08-10 13:50:01 1.0 NaN
2016-08-10 13:50:02 2.0 0.000000
2016-08-10 13:50:03 3.0 0.666667
2016-08-10 13:50:04 4.0 1.333333
2016-08-10 13:50:05 NaN 2.000000
2016-08-10 13:50:06 NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With