Pandas: replace outliers in all columns with nan

Question

I have a data frame with 3 columns, for ex

c1,c2,c3 
10000,1,2 
1,3,4 
2,5,6 
3,1,122 
4,3,4 
5,5,6 
6,155,6

I want to replace the outliers in all the columns which are outside 2 sigma. Using the below code, I can create a dataframe without the outliers.

df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 2).all(axis=1)]


c1,c2,c3 
1,3,4 
2,5,6 
4,3,4 
5,5,6

I can find the outliers for each column separately and replace with "nan", but that would not be the best way as the number of lines in the code increases with the number of columns. There must be a better way of doing this. May be boolean output from the above command for rows and then replace "TRUE" with "nan".

Any suggestions, many thanks.

piRSquared · Accepted Answer

pandas
Use pd.DataFrame.mask

df.mask(df.sub(df.mean()).div(df.std()).abs().gt(2))

    c1   c2  c3 
0  NaN  1.0  2.0
1  1.0  3.0  4.0
2  2.0  5.0  6.0
3  3.0  1.0  NaN
4  4.0  3.0  4.0
5  5.0  5.0  6.0
6  6.0  NaN  6.0

numpy

v = df.values
mask = np.abs((v - v.mean(0)) / v.std(0)) > 2
pd.DataFrame(np.where(mask, np.nan, v), df.index, df.columns)

    c1   c2  c3 
0  NaN  1.0  2.0
1  1.0  3.0  4.0
2  2.0  5.0  6.0
3  3.0  1.0  NaN
4  4.0  3.0  4.0
5  5.0  5.0  6.0
6  6.0  NaN  6.0

Pandas: replace outliers in all columns with nan

Tags:

replace

pandas

outliers

Sridhar

1 Answers

piRSquared

Recent Activity

Donate For Us

Pandas: replace outliers in all columns with nan

Tags:

replace

pandas

outliers

Sridhar

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us