Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: replace numpy.nan cell with maximum of non-nan adjacent cells

Tags:

python

pandas

test case:

df = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, 4]],
                    columns=list('ABCD'))

where A[i + 1, j], A[i - 1, j], A[i, j + 1], A[i, j - 1] are the set of entries adjacent to A[i,j].

In so many words, this:

     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

should become this:

     A    B   C  D
0  3.0  2.0 2.0  0.0
1  3.0  4.0 4.0  1.0
2  3.0  4.0 5.0  5.0
3  3.0  3.0 4.0  4.0
like image 276
user189035 Avatar asked Dec 01 '22 14:12

user189035


1 Answers

You can use the rolling method over both directions and then find the max of each. Then you can use that to fill in the missing values of the original.

df1 = df.rolling(3, center=True, min_periods=1).max().fillna(-np.inf)
df2 = df.T.rolling(3, center=True, min_periods=1).max().T.fillna(-np.inf)
fill = df1.where(df1 > df2).fillna(df2)
df.fillna(fill)

Output

     A    B    C  D
0  3.0  2.0  2.0  0
1  3.0  4.0  4.0  1
2  3.0  4.0  5.0  5
3  3.0  3.0  4.0  4
like image 173
Ted Petrou Avatar answered Dec 04 '22 10:12

Ted Petrou