I have a dataframe with numeric and non-numeric values with a datetime index:
df = pd.DataFrame([
{'date': datetime(2017, 4, 24, 1), 'a':1, 'b':2, 'c': "hee"},
{'date': datetime(2017, 4, 24, 2), 'a':2, 'b':4, 'c': 'hoo'},
{'date': datetime(2017, 4, 24, 3), 'a':4, 'b':8, 'c': 'joo'},
{'date': datetime(2017, 4, 24, 4), 'a':8, 'b':16, 'c': 'jee'}
]).set_index('date')
I would like to:
What is the most elegant implementation?
Strategy 1
Interpolate all, then fillna
:
df = df.resample('20T').interpolate('linear')
df.fillna(method='nearest')
But... the nearest
method is not implemented.
Strategy 2
Split the numeric and non-numeric columns
df2 = df.resample('20T')
df_a = df2._get_numeric_data().interpolate('linear')
df_b = df2[list(set(df.columns) - set(set(df_a.columns)))].interpolate('nearest')
which gives an error:
TypeError: Cannot interpolate with all NaNs.
update
Interpolation with the nearest method, does apply to Boolean and numeric values, but not for strings, e.g.:
df.resample('20T').intepolate('nearest')
Since interpolate("nearest")
works fine with numeric types, a solution is to:
interpolate("nearest")
Map back to string the interpolated categorical column
def fillna_nearest(series):
fact = series.astype('category').factorize()
series_cat = pd.Series(fact[0]).replace(-1, np.nan) # get string as categorical (-1 is NaN)
series_cat_interp = series_cat.interpolate("nearest") # interpolate categorical
cat_to_string = {i:x for i,x in enumerate(fact[1])} # dict connecting category to string
series_str_interp = series_cat_interp.map(cat_to_string) # turn category back to string
return series_str_interp
In [10]: df.resample('20T').interpolate().apply(fillna_nearest)
Out[10]:
a b c
0 1.000000 2.000000 hee
1 1.333333 2.666667 hee
2 1.666667 3.333333 hoo
3 2.000000 4.000000 hoo
4 2.666667 5.333333 hoo
5 3.333333 6.666667 joo
6 4.000000 8.000000 joo
7 5.333333 10.666667 joo
8 6.666667 13.333333 jee
9 8.000000 16.000000 jee
Is that what you want?
In [22]: df.resample('20T').interpolate().ffill()
Out[22]:
a b c
date
2017-04-24 01:00:00 1.000000 2.000000 hee
2017-04-24 01:20:00 1.333333 2.666667 hee
2017-04-24 01:40:00 1.666667 3.333333 hee
2017-04-24 02:00:00 2.000000 4.000000 hoo
2017-04-24 02:20:00 2.666667 5.333333 hoo
2017-04-24 02:40:00 3.333333 6.666667 hoo
2017-04-24 03:00:00 4.000000 8.000000 joo
2017-04-24 03:20:00 5.333333 10.666667 joo
2017-04-24 03:40:00 6.666667 13.333333 joo
2017-04-24 04:00:00 8.000000 16.000000 jee
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With