Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill missing value by averaging previous row value

I want to fill missing value with the average of previous N row value, example is shown below:

N=2
df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, np.nan]],
                    columns=list('ABCD'))

DataFrame is like:

     A   B   C  D
0   NaN 2.0 NaN 0
1   3.0 4.0 NaN 1
2   NaN NaN NaN 5
3   NaN 3.0 NaN NaN

Result should be:

     A   B       C  D
0   NaN 2.0     NaN 0
1   3.0 4.0     NaN 1
2   NaN (4+2)/2 NaN 5
3   NaN 3.0     NaN (1+5)/2

I am wondering if there is elegant and fast way to achieve this without for loop.

like image 699
Garvey Avatar asked Nov 16 '18 10:11

Garvey


1 Answers

rolling + mean + shift

You will need to modify the below logic to interpret the mean of NaN and another value, in the case where one of the previous two values are null.

df = df.fillna(df.rolling(2).mean().shift())

print(df)

     A    B   C    D
0  NaN  2.0 NaN  0.0
1  3.0  4.0 NaN  1.0
2  NaN  3.0 NaN  5.0
3  NaN  3.0 NaN  3.0
like image 138
jpp Avatar answered Nov 09 '22 22:11

jpp