Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Seaborn's boxplot whiskers meaning

Tags:

python

seaborn

I'm working with seaborn's box plot, and I can't seem to figure out the placement of the whiskers in the default settings.

Looking at the seaborn.boxplot docs I see that whis=1.5 which I assumed means that the whiskers are placed at UPPER_QUARTILE + IQR*1.5 and at LOWER_QUARTILE - IQR*1.5.

But even in the docs themselves, looking at the whiskers, we can see that they have different gap lengths from the upper/lower quartiles

docs

It is easy to see that the lengths on both sides of the box to the whiskers are not equal, so my assumption is obviously wrong.

So how the default whiskers are placed? Or maybe I fail to understand something more basic about the nature of the box plot?

like image 640
bluesummers Avatar asked Aug 05 '18 13:08

bluesummers


People also ask

What do whiskers represent in a box plot?

Sometimes, the mean is also indicated by a dot or a cross on the box plot. The whiskers are the two lines outside the box, that go from the minimum to the lower quartile (the start of the box) and then from the upper quartile (the end of the box) to the maximum.

Whats a whisker plot?

Box and whisker plot is one type of graphical representation which shows the five-number summary for the given set of data, such as minimum value, lower quartile, median, upper quartile, maximum value.

What does Seaborn boxplot show?

Draw a box plot to show distributions with respect to categories. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable.


1 Answers

IIRC, the whiskers extend to the the lowest (highest) data point still within 1.5 IQR of the lower (upper) quartile. So depending on where the data points actually are, the whiskers on both sides won't necessarily be of the same length.

Matplotlib docs for the whis argument of boxplot() (which Seaborn is built on top of) seem to confirm this:

whis : float, sequence, or string (default = 1.5) As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whis*IQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whis*IQR. Beyond the whiskers, data are considered outliers and are plotted as individual points.

like image 182
andrew_reece Avatar answered Sep 19 '22 02:09

andrew_reece