Suppose we have a df:
A B
apple 1.0
apple 2.0
apple NA
orange NA
orange 7.0
melon 14.0
melon NA
melon 15.0
melon 16.0
to replace the NA, we can use df["B"].fillna(df["B"].median())
, but it will fill NA with the median of all data in "B"
Is there any way that we can use the median of a certain A to replace the NA (like below):
A B
apple 1.0
apple 2.0
apple **1.5**
orange **7.0**
orange 7.0
melon 14.0
melon **15.0**
melon 15.0
melon 16.0
Thanks!
In pandas you may use transform
to obtain null-fill values:
>>> med = df.groupby('A')['B'].transform('median')
>>> df['B'].fillna(med)
0 1.0
1 2.0
2 1.5
3 7.0
4 7.0
5 14.0
6 15.0
7 15.0
8 16.0
Name: B, dtype: float64
In R
, can use na.aggregate/data.table
to replace the NA
by mean
value of the group. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'A', apply the na.aggregate
on 'B'.
library(zoo)
library(data.table)
setDT(df)[, B:= na.aggregate(B), A]
df
# A B
#1: apple 1.0
#2: apple 2.0
#3: apple 1.5
#4: orange 7.0
#5: orange 7.0
#6: melon 14.0
#7: melon 15.0
#8: melon 15.0
#9: melon 16.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With