Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to take floor and capping for removing outliers

Tags:

python

pandas

How to calculate 99% and 1% percentile as cap and floor for each column, the if value >= 99% percentile then redefine the value as the value of 99% percentile; similarly if value <= 1% percentile then redefine value as the value of 1% percentile

np.random.seed(2)
df = pd.DataFrame({'value1': np.random.randn(100), 'value2': np.random.randn(100)})
df['lrnval'] = np.where(np.random.random(df.shape[0])>=0.7, 'learning', 'validation')

if we have hundreds columns, can we use apply function instead of do loop?

like image 610
Gavin Avatar asked Sep 13 '25 06:09

Gavin


1 Answers

Based on Abdou's answer, the following might save you some time:

for col in df.columns:
    percentiles = df[col].quantile([0.01, 0.99]).values
    df[col][df[col] <= percentiles[0]] = percentiles[0]
    df[col][df[col] >= percentiles[1]] = percentiles[1]

or use numpy.clip:

import numpy as np
for col in df.columns:
    percentiles = df[col].quantile([0.01, 0.99]).values
    df[col] = np.clip(df[col], percentiles[0], percentiles[1])
like image 101
lleiou Avatar answered Sep 14 '25 19:09

lleiou