How to remove outliers from groups based on percentile

Question

I have a table df like this, but longer and with many other type values.

I want to remove from df all records with outliers using the 95th percentile but broken down into individual values in the type column.

For a single value of type, I do it like this:

my_perc = 95
temp = df[df['type'] == 'a']
temp[temp.weight < np.percentile(temp.weight, my_perc)]

Now I would like to do this automatically for the whole table df, taking into account individual groups in the type column.

I also tried this:

df[df.groupby(['type'])['weight'] < np.percentile(df.weight, my_perc)]

But it doesn't work.

Do you have any idea for this?

sdom · Accepted Answer

Ok, probably problem solved:

my_perc = 0.95
df[df.groupby('type')['weight'].transform(lambda x : x < x.quantile(my_perc))]

Donate For Us