pandas interpolate with nearest for non-numeric values

Question

I have a dataframe with numeric and non-numeric values with a datetime index:

df = pd.DataFrame([
    {'date': datetime(2017, 4, 24, 1), 'a':1, 'b':2, 'c': "hee"},
    {'date': datetime(2017, 4, 24, 2), 'a':2, 'b':4, 'c': 'hoo'},
    {'date': datetime(2017, 4, 24, 3), 'a':4, 'b':8, 'c': 'joo'},
    {'date': datetime(2017, 4, 24, 4), 'a':8, 'b':16, 'c': 'jee'}
]).set_index('date')

I would like to:

linear interpolate the numeric values; and
get the nearest value for non-numeric values.

What is the most elegant implementation?

Strategy 1

Interpolate all, then fillna:

df = df.resample('20T').interpolate('linear')
df.fillna(method='nearest')

But... the nearest method is not implemented.

Strategy 2

Split the numeric and non-numeric columns

df2 = df.resample('20T')
df_a = df2._get_numeric_data().interpolate('linear')
df_b = df2[list(set(df.columns) - set(set(df_a.columns)))].interpolate('nearest')

which gives an error:

TypeError: Cannot interpolate with all NaNs.

update

Interpolation with the nearest method, does apply to Boolean and numeric values, but not for strings, e.g.:

df.resample('20T').intepolate('nearest')

AMH · Accepted Answer

Since interpolate("nearest") works fine with numeric types, a solution is to:

Convert your column from string to categorical (which is numeric)
Interpolate categorical column with interpolate("nearest")

Map back to string the interpolated categorical column

def fillna_nearest(series):
    fact = series.astype('category').factorize()

    series_cat = pd.Series(fact[0]).replace(-1, np.nan) # get string as categorical (-1 is NaN)
    series_cat_interp = series_cat.interpolate("nearest") # interpolate categorical

    cat_to_string = {i:x for i,x in enumerate(fact[1])} # dict connecting category to string
    series_str_interp = series_cat_interp.map(cat_to_string) # turn category back to string

    return series_str_interp


In [10]: df.resample('20T').interpolate().apply(fillna_nearest)
Out[10]: 
          a          b    c
0  1.000000   2.000000  hee
1  1.333333   2.666667  hee
2  1.666667   3.333333  hoo
3  2.000000   4.000000  hoo
4  2.666667   5.333333  hoo
5  3.333333   6.666667  joo
6  4.000000   8.000000  joo
7  5.333333  10.666667  joo
8  6.666667  13.333333  jee
9  8.000000  16.000000  jee

MaxU - stop WAR against UA · Answer

Is that what you want?

In [22]: df.resample('20T').interpolate().ffill()
Out[22]:
                            a          b    c
date
2017-04-24 01:00:00  1.000000   2.000000  hee
2017-04-24 01:20:00  1.333333   2.666667  hee
2017-04-24 01:40:00  1.666667   3.333333  hee
2017-04-24 02:00:00  2.000000   4.000000  hoo
2017-04-24 02:20:00  2.666667   5.333333  hoo
2017-04-24 02:40:00  3.333333   6.666667  hoo
2017-04-24 03:00:00  4.000000   8.000000  joo
2017-04-24 03:20:00  5.333333  10.666667  joo
2017-04-24 03:40:00  6.666667  13.333333  joo
2017-04-24 04:00:00  8.000000  16.000000  jee

pandas interpolate with nearest for non-numeric values

Tags:

datetime

python-3.x

pandas

Joost Döbken

Video Answer

2 Answers

AMH

MaxU - stop WAR against UA

Recent Activity

Donate For Us

pandas interpolate with nearest for non-numeric values

Tags:

datetime

python-3.x

pandas

Joost Döbken

Video Answer

2 Answers

AMH

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us