Pandas

Question

I have a dataframe that looks like this:

n    Date        Area    Rank

12  2007-03-02  Other   4.276250
24  2007-03-02  Other   4.512632
3   2007-03-02  Other   3.513571
36  2007-03-02  Other   4.514000
48  2007-03-02  Other   4.55000

I want to resample for values between the n interval, to ultimately interpolate the rank field once I have those values. If n were a datetime or similar object, I could just resample. How would I do that but with a float or int?

Output should be something like this (dummy numbers for Rank, just an example)

n    Date        Area    Rank

3   2007-03-02  Other   3.513571
4   2007-03-02  Other   3.513675
5   2007-03-02  Other   3.524819
6   2007-03-02  Other   3.613427
7   2007-03-02  Other   3.685635
....
....

andrew_reece · Accepted Answer

df = (df.set_index('n')
        .reindex(range(df.n.min(), df.n.max()))
        .interpolate()
        .reset_index())
df[['Date','Area']] = df[['Date','Area']].ffill()

Output:

     n        Date   Area      Rank
0    3  2007-03-02  Other  3.513571
1    4  2007-03-02  Other  3.598313
2    5  2007-03-02  Other  3.683055
3    6  2007-03-02  Other  3.767797
4    7  2007-03-02  Other  3.852539
5    8  2007-03-02  Other  3.937282
6    9  2007-03-02  Other  4.022024
7   10  2007-03-02  Other  4.106766
8   11  2007-03-02  Other  4.191508
9   12  2007-03-02  Other  4.276250
10  13  2007-03-02  Other  4.295948
11  14  2007-03-02  Other  4.315647
                                ...

There may be a way to interpolate using different methods, based on column type - then you wouldn't need the separate ffill() for the non-float columns. I played around with apply() a bit, but couldn't get it to work.

Pandas - resample on non-datetime

Tags:

python

Solaxun

1 Answers

andrew_reece

Recent Activity

Donate For Us