Pandas fillna using groupby


I am trying to impute/fill values using rows with similar columns' values.

For example, I have this dataframe:

one | two | three 1      1     10 1      1     nan 1      1     nan 1      2     nan 1      2     20 1      2     nan 1      3     nan 1      3     nan 

I wanted to using the keys of column one and two which is similar and if column three is not entirely nan then impute the existing value from a row of similar keys with value in column '3'.

Here is my desired result:

one | two | three 1      1     10 1      1     10 1      1     10 1      2     20 1      2     20 1      2     20 1      3     nan 1      3     nan 

You can see that keys 1 and 3 do not contain any value because the existing value does not exists.

I have tried using groupby+fillna():

df['three'] = df.groupby(['one','two'])['three'].fillna() 

which gave me an error.

I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.

df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill() 
2 Answers

If only one non NaN value per group use ffill (forward filling) and bfill (backward filling) per group, so need apply with lambda:

df['three'] = df.groupby(['one','two'], sort=False)['three']                 .apply(lambda x: x.ffill().bfill()) print (df)    one  two  three 0    1    1   10.0 1    1    1   10.0 2    1    1   10.0 3    1    2   20.0 4    1    2   20.0 5    1    2   20.0 6    1    3    NaN 7    1    3    NaN 

But if multiple value per group and need replace NaN by some constant - e.g. mean by group:

print (df)    one  two  three 0    1    1   10.0 1    1    1   40.0 2    1    1    NaN 3    1    2    NaN 4    1    2   20.0 5    1    2    NaN 6    1    3    NaN 7    1    3    NaN  df['three'] = df.groupby(['one','two'], sort=False)['three']                 .apply(lambda x: x.fillna(x.mean())) print (df)    one  two  three 0    1    1   10.0 1    1    1   40.0 2    1    1   25.0 3    1    2   20.0 4    1    2   20.0 5    1    2   20.0 6    1    3    NaN 7    1    3    NaN 
You can sort data by the column with missing values then groupby and forwardfill:

df.sort_values('three', inplace=True) df['three'] = df.groupby(['one','two'])['three'].ffill() 
