I have a python data-frame in which there are some outlier values. I would like to replace them with the median values of the data, had those values not been there.
id Age
10236 766105
11993 288
9337 205
38189 88
35555 82
39443 75
10762 74
33847 72
21194 70
39450 70
So, I want to replace all the values > 75 with the median value of the dataset of the remaining dataset, i.e., the median value of 70,70,72,74,75
.
I'm trying to do the following:
But somehow, the below code not working
df['age'].replace(df.age>75,0,inplace=True)
in this technique, we replace the extreme values with the mode value, you can use median or mean value but it is advised not to use the mean values because it is highly susceptible to outliers.
I think this is what you are looking for, you can use loc to assign value . Then you can fill the nan
median = df.loc[df['Age']<75, 'Age'].median()
df.loc[df.Age > 75, 'Age'] = np.nan
df.fillna(median,inplace=True)
You can also use np.where in one line
df["Age"] = np.where(df["Age"] >75, median,df['Age'])
You can also use .mask i.e
df["Age"] = df["Age"].mask(df["Age"] >75, median)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With