Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.Series.interpolate() does nothing. Why?

Tags:

python

pandas

I have a dataframe with DatetimeIndex. This is one of columns:

>>> y.out_brd
2013-01-01 11:25:00     0.04464286
2013-01-01 11:30:00            NaN
2013-01-01 11:35:00            NaN
2013-01-01 11:40:00    0.005952381
2013-01-01 11:45:00     0.01785714
2013-01-01 11:50:00    0.008928571
Freq: 5T, Name: out_brd, dtype: object

When I'm trying to use interpolate() on function I get absolutly nothing changes:

>>> y.out_brd.interpolate(method='time')
2013-01-01 11:25:00     0.04464286
2013-01-01 11:30:00            NaN
2013-01-01 11:35:00            NaN
2013-01-01 11:40:00    0.005952381
2013-01-01 11:45:00     0.01785714
2013-01-01 11:50:00    0.008928571
Freq: 5T, Name: out_brd, dtype: object

How to make it work?

Update: the code for generating such a dataframe.

time_index = pd.date_range(start=datetime(2013, 1, 1, 3),
                       end=datetime(2013, 1, 2, 2, 59),
                       freq='5T')
grid_columns = [u'in_brd', u'in_alt', u'out_brd', u'out_alt']                           

df = pd.DataFrame(index=time_index, columns=grid_columns)

After that I fill cells with some data.

I have dataframe field_data with survey data about boarding and alighting on railroad, and station variable. I also have interval_end function defined like this:

interval_end = lambda index, prec_lvl: index.to_datetime() \
                        + timedelta(minutes=prec_lvl - 1,
                                    seconds=59)

The code:

for index, row in df.iterrows():
    recs = field_data[(field_data.station_name == station)
                    & (field_data.arrive_time >= index.time())
                    & (field_data.arrive_time <= interval_end(
                                        index, prec_lvl).time())]
    in_recs_num = recs[recs.orientation == u'in'][u'train_number'].count()
    out_recs_num = recs[recs.orientation == u'out'][u'train_number'].count()

    if in_recs_num:
        df.loc[index, u'in_brd'] = recs[
                recs.orientation == u'in'][u'boarding'].sum()    / \
                (in_recs_num * CAR_CAPACITY)
        df.loc[index, u'in_alt'] = recs[
                recs.orientation == u'in'][u'alighting'].sum()   / \
                (in_recs_num * CAR_CAPACITY)
    if out_recs_num:
        df.loc[index, u'out_brd'] = recs[
                recs.orientation == u'out'][u'boarding'].sum()  / \
                (out_recs_num * CAR_CAPACITY)
        df.loc[index, u'out_alt'] = recs[
                recs.orientation == u'out'][u'alighting'].sum() / \
                (out_recs_num * CAR_CAPACITY)
like image 838
Mikhail Elizarev Avatar asked Apr 25 '14 10:04

Mikhail Elizarev


3 Answers

You could also fix this without changing the name of the data frame with the function "in place":

y.out_brd.interpolate(method='time', inplace=True)
like image 176
Santi Gil Avatar answered Nov 14 '22 03:11

Santi Gil


You need to convert your Series to have a dtype of float64 instead of your current object. Here's an example to illustrate the difference. Note that in general object dtype Series are of limited use, the most common case being a Series containing strings. Other than that they are very slow since they cannot take advantage of any data type information.

In [9]: s = Series(randn(6), index=pd.date_range('2013-01-01 11:25:00', freq='5T', periods=6), dtype=object)

In [10]: s.iloc[1:3] = nan

In [11]: s
Out[11]:
2013-01-01 11:25:00   -0.69522
2013-01-01 11:30:00        NaN
2013-01-01 11:35:00        NaN
2013-01-01 11:40:00   -0.70308
2013-01-01 11:45:00    -1.5653
2013-01-01 11:50:00    0.95893
Freq: 5T, dtype: object

In [12]: s.interpolate(method='time')
Out[12]:
2013-01-01 11:25:00   -0.69522
2013-01-01 11:30:00        NaN
2013-01-01 11:35:00        NaN
2013-01-01 11:40:00   -0.70308
2013-01-01 11:45:00    -1.5653
2013-01-01 11:50:00    0.95893
Freq: 5T, dtype: object

In [13]: s.astype(float).interpolate(method='time')
Out[13]:
2013-01-01 11:25:00   -0.6952
2013-01-01 11:30:00   -0.6978
2013-01-01 11:35:00   -0.7005
2013-01-01 11:40:00   -0.7031
2013-01-01 11:45:00   -1.5653
2013-01-01 11:50:00    0.9589
Freq: 5T, dtype: float64
like image 20
Phillip Cloud Avatar answered Nov 14 '22 03:11

Phillip Cloud


I am late but, this solved my problem. You need to assign the outcome to some variable or itself.

y=y.out_brd.interpolate(method='time')
like image 2
Dimanjan Avatar answered Nov 14 '22 05:11

Dimanjan