How to calculate 99% and 1% percentile as cap and floor for each column, the if value >= 99% percentile then redefine the value as the value of 99% percentile; similarly if value <= 1% percentile then redefine value as the value of 1% percentile
np.random.seed(2)
df = pd.DataFrame({'value1': np.random.randn(100), 'value2': np.random.randn(100)})
df['lrnval'] = np.where(np.random.random(df.shape[0])>=0.7, 'learning', 'validation')
if we have hundreds columns, can we use apply function instead of do loop?
Based on Abdou's answer, the following might save you some time:
for col in df.columns:
percentiles = df[col].quantile([0.01, 0.99]).values
df[col][df[col] <= percentiles[0]] = percentiles[0]
df[col][df[col] >= percentiles[1]] = percentiles[1]
or use numpy.clip:
import numpy as np
for col in df.columns:
percentiles = df[col].quantile([0.01, 0.99]).values
df[col] = np.clip(df[col], percentiles[0], percentiles[1])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With