I have a table df like this, but longer and with many other type values.
| type | weight |
|---|---|
| a | 35.1 |
| a | 36.7 |
| b | 100.2 |
| b | 99.3 |
| b | 102.0 |
| b | 5.0 |
| a | 38.2 |
| a | 250.8 |
I want to remove from df all records with outliers using the 95th percentile but broken down into individual values in the type column.
For a single value of type, I do it like this:
my_perc = 95
temp = df[df['type'] == 'a']
temp[temp.weight < np.percentile(temp.weight, my_perc)]
Now I would like to do this automatically for the whole table df, taking into account individual groups in the type column.
I also tried this:
df[df.groupby(['type'])['weight'] < np.percentile(df.weight, my_perc)]
But it doesn't work.
Do you have any idea for this?
Ok, probably problem solved:
my_perc = 0.95
df[df.groupby('type')['weight'].transform(lambda x : x < x.quantile(my_perc))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With