Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Dynamically replace NaN values with the average of previous and next non-missing values

Tags:

python

pandas

I have a dataframe df with NaN values and I want to dynamically replace them with the average values of previous and next non-missing values.

In [27]: df 
Out[27]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3       NaN -2.027325  1.533582
4       NaN       NaN  0.461821
5 -0.788073       NaN       NaN
6 -0.916080 -0.612343       NaN
7 -0.887858  1.033826       NaN
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

For example, A[3] is NaN so its value should be (-0.120211-0.788073)/2 = -0.454142. A[4] then should be (-0.454142-0.788073)/2 = -0.621108.

Therefore, the result dataframe should look like:

In [27]: df 
Out[27]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.454142 -2.027325  1.533582
4 -0.621108 -1.319834  0.461821
5 -0.788073 -0.966089 -1.260202
6 -0.916080 -0.612343 -2.121213
7 -0.887858  1.033826 -2.551718
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

Is this a good way to deal with the missing values? I can't simply replace them by the average values of each column because my data is time-series and tends to increase over time. (The initial value may be $0 and final value might be $100000, so the average is $50000 which can be much bigger/smaller than the NaN values).

like image 353
Jun Seong Jang Avatar asked Oct 28 '22 07:10

Jun Seong Jang


1 Answers

You can try to understand your logic behind the average that is Geometric progression

s=df.isnull().cumsum()
t1=df[(s==1).shift(-1).fillna(False)].stack().reset_index(level=0,drop=True)
t2=df.lookup(s.idxmax()+1,s.idxmax().index)
df.fillna(t1/(2**s)+t2*(1-0.5**s)*2/2)
Out[212]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.454142 -2.027325  1.533582
4 -0.621107 -1.319834  0.461821
5 -0.788073 -0.966089 -1.260201
6 -0.916080 -0.612343 -2.121213
7 -0.887858  1.033826 -2.551718
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

Explanation:

1st NaN x/2+y/2=1st

2nd NaN 1st/2+y/2=2nd

3rd NaN 2nd/2+y/2+3rd

Then x/(2**n)+y(1-(1/2)**n)/(1-1/2), this is the key

like image 57
BENY Avatar answered Nov 15 '22 05:11

BENY