I would like to know what algorithm is used to determine the 'outliers' in a boxplot distribution in Seaborn.
On their website seaborn.boxplot they simple state:
The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
I would really like to know what method they use. I've created boxplots from a dataframe and I seem to have a lot of 'outliers'.
Thanks
It appears, by testing, that seaborn uses whis=1.5
as the default.
whis
is defined as the
Proportion of the IQR past the low and high quartiles to extend the plot whiskers.
For a normal distribution, the interquartile range contains 50% of the population and 1.5 * IQR contains about 99%.
You can calculate it this way:
Q1 = df.quartile(0.25)
Q3 = df.quartile(0.75)
IQR = Q3 - Q1
It's an outlier if it is less than:
Q1 - 1.5 * IQR
or if it is greater than:
Q3 + 1.5 * IQR
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With