Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: replacing outliers values with median values

I have a python data-frame in which there are some outlier values. I would like to replace them with the median values of the data, had those values not been there.

id         Age
10236    766105
11993       288
9337        205
38189        88
35555        82
39443        75
10762        74
33847        72
21194        70
39450        70

So, I want to replace all the values > 75 with the median value of the dataset of the remaining dataset, i.e., the median value of 70,70,72,74,75.

I'm trying to do the following:

  1. Replace with 0, all the values that are greater than 75
  2. Replace the 0s with median value.

But somehow, the below code not working

df['age'].replace(df.age>75,0,inplace=True)
like image 609
user4943236 Avatar asked Jul 29 '17 08:07

user4943236


People also ask

Can we replace outliers with median?

in this technique, we replace the extreme values with the mode value, you can use median or mean value but it is advised not to use the mean values because it is highly susceptible to outliers.


1 Answers

I think this is what you are looking for, you can use loc to assign value . Then you can fill the nan

median = df.loc[df['Age']<75, 'Age'].median()
df.loc[df.Age > 75, 'Age'] = np.nan
df.fillna(median,inplace=True)

You can also use np.where in one line

df["Age"] = np.where(df["Age"] >75, median,df['Age'])

You can also use .mask i.e

df["Age"] = df["Age"].mask(df["Age"] >75, median)
like image 127
Bharath Avatar answered Oct 23 '22 03:10

Bharath