Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - resample on non-datetime

Tags:

python

pandas

I have a dataframe that looks like this:

n    Date        Area    Rank

12  2007-03-02  Other   4.276250
24  2007-03-02  Other   4.512632
3   2007-03-02  Other   3.513571
36  2007-03-02  Other   4.514000
48  2007-03-02  Other   4.55000

I want to resample for values between the n interval, to ultimately interpolate the rank field once I have those values. If n were a datetime or similar object, I could just resample. How would I do that but with a float or int?

Output should be something like this (dummy numbers for Rank, just an example)

n    Date        Area    Rank

3   2007-03-02  Other   3.513571
4   2007-03-02  Other   3.513675
5   2007-03-02  Other   3.524819
6   2007-03-02  Other   3.613427
7   2007-03-02  Other   3.685635
....
....
like image 336
Solaxun Avatar asked Aug 31 '25 22:08

Solaxun


1 Answers

df = (df.set_index('n')
        .reindex(range(df.n.min(), df.n.max()))
        .interpolate()
        .reset_index())
df[['Date','Area']] = df[['Date','Area']].ffill()

Output:

     n        Date   Area      Rank
0    3  2007-03-02  Other  3.513571
1    4  2007-03-02  Other  3.598313
2    5  2007-03-02  Other  3.683055
3    6  2007-03-02  Other  3.767797
4    7  2007-03-02  Other  3.852539
5    8  2007-03-02  Other  3.937282
6    9  2007-03-02  Other  4.022024
7   10  2007-03-02  Other  4.106766
8   11  2007-03-02  Other  4.191508
9   12  2007-03-02  Other  4.276250
10  13  2007-03-02  Other  4.295948
11  14  2007-03-02  Other  4.315647
                                ...

There may be a way to interpolate using different methods, based on column type - then you wouldn't need the separate ffill() for the non-float columns. I played around with apply() a bit, but couldn't get it to work.

like image 66
andrew_reece Avatar answered Sep 05 '25 02:09

andrew_reece



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!