Let's say that we have the following dataframe. I want to fill the null values of the column height group by column Subject and the following conditions.
Note: In the desired dataframe we must have the same value per Subject.
df = pd.DataFrame({'Subject': [1,1,2,2,3,3], 'x':['AA','AA','BB','BB','AA','AA'], 'height': [130, np.nan, np.nan, 170, np.nan, np.nan]})
Here is the desired dataframe.
You can first fillna()
using a grouped ffill()
and bfill()
, and then using the column median
:
df.groupby('Subject')['height'].fillna(method='ffill',inplace=True).fillna(method='bfill',inplace=True)
df['height'].fillna(df['height'].median(),inplace=True)
Output:
Subject x height
0 1 AA 130.0
1 1 AA 130.0
2 2 BB 170.0
3 2 BB 170.0
4 3 AA 150.0
5 3 AA 150.0
Edit: If you require that the median should be taken over the values of the x values that equal the ones missing, not over the whole dataset, you could use @xicoaio's advice and replace my second line df['height'].fillna(df['height'].median(),inplace=True)
, with:
df['height'] = df.apply(lambda x: x['height'] if x['height'] == np.nan else df[df['x'] == x['x']]['height'].median() , axis=1)
Output:
Subject x height
0 1 AA 130.0
1 1 AA 130.0
2 2 BB 170.0
3 2 BB 170.0
4 3 AA 130.0
5 3 AA 130.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With