Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Interpolate between all Values in Two Separate Pandas DataFrames?

Let's assume you have two Pandas DataFrames, one containing data for the year 2020 and the other containing data for the year 2030. Both DataFrames have the same shape, column names, and only contain numeric values. For simplicity, we'll create them as follows:

twenty = pd.DataFrame({'A':[1,1,1], 'B':[3,3,3]})
thirty = pd.DataFrame({'A':[3,3,3], 'B':[7,7,7]})

Now, the goal is to perform a linear interpolation on all values in these DataFrames to obtain a new DataFrame for the year 2025 (or whatever year we select). So, we would want to interpolate between each paired set of values, such as twenty['A'][0] and thirty['A'][0]. If we did this for the target year 2025, the result should be:

twentyfive = pd.DataFrame({'A':[2,2,2],'B':[5,5,5]})

I've attempted to use np.interp; however, that is really intended for interpolation on a given (singular) array as far as I can tell. And I've solved the problem using a more brute-force method of melting the DataFrames, adding year columns, merging them together, and then creating a new column with the interpolated values. It's a bit messy and long-winded.

I feel like there must be a more straight-forward (and optimized) way of performing this task. Any help is appreciated.

like image 470
rhozzy Avatar asked Jan 22 '26 08:01

rhozzy


2 Answers

You can try of taking average directly, if both have same shape

(thirty + twenty)/2

Out:

    A   B
0   2   5
1   2   5
2   2   5

Edit : if the dataframes does not have equal shapes, you can try of merging with inner join and groupby columns to take interpolated mean.

df = pd.merge(twenty,thirty, left_index=True, right_index=True, how='inner').rename(columns=lambda x: x.split('_')[0])
df.T.groupby(df.T.index).mean().T

Out:

    A   B
0   2   5
1   2   5
2   2   5
like image 177
Naga kiran Avatar answered Jan 23 '26 21:01

Naga kiran


You can concat being smart about the keys (naming them integers), and then groupby allowing you to interpolate everything:

import pandas as pd

df = pd.concat([twenty, thirty], keys=[20,30], axis=1)
s = (df.groupby(df.columns.get_level_values(1), axis=1)
        .apply(lambda x: x.T.reset_index(1, drop=True).reindex(np.arange(20,31)).interpolate())).T

      20   21   22   23   24   25   26   27   28   29   30
A 0  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0
  1  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0
  2  1.0  1.2  1.4  1.6  1.8  2.0  2.2  2.4  2.6  2.8  3.0
B 0  3.0  3.4  3.8  4.2  4.6  5.0  5.4  5.8  6.2  6.6  7.0
  1  3.0  3.4  3.8  4.2  4.6  5.0  5.4  5.8  6.2  6.6  7.0
  2  3.0  3.4  3.8  4.2  4.6  5.0  5.4  5.8  6.2  6.6  7.0

Now if you just care about 25:

s[25].unstack(0)

     A    B
0  2.0  5.0
1  2.0  5.0
2  2.0  5.0
like image 39
ALollz Avatar answered Jan 23 '26 22:01

ALollz