Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Seaborn - How are outliers determined in boxplots

Tags:

python

seaborn

I would like to know what algorithm is used to determine the 'outliers' in a boxplot distribution in Seaborn.

On their website seaborn.boxplot they simple state:

The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.

I would really like to know what method they use. I've created boxplots from a dataframe and I seem to have a lot of 'outliers'.

boxplots of my dataframe Thanks

like image 202
jlt199 Avatar asked Apr 06 '17 19:04

jlt199


2 Answers

It appears, by testing, that seaborn uses whis=1.5 as the default.

whis is defined as the

Proportion of the IQR past the low and high quartiles to extend the plot whiskers.

For a normal distribution, the interquartile range contains 50% of the population and 1.5 * IQR contains about 99%.

like image 170
AlexG Avatar answered Nov 15 '22 00:11

AlexG


You can calculate it this way:

Q1 = df.quartile(0.25)
Q3 = df.quartile(0.75)
    
IQR = Q3 - Q1

It's an outlier if it is less than:

Q1 - 1.5 * IQR

or if it is greater than:

Q3 + 1.5 * IQR
like image 24
Gabriela Trindade Avatar answered Nov 15 '22 00:11

Gabriela Trindade