I'm currently working on my masters dissertation. I've processed all my data in custom-coded Python, and one of my main methods of displaying data is the boxplot in matplotlib. I've been looking at the documentation, but I can't see anything about how it categorises outliers (or "fliers") and excludes them from the range.
It's not the end of the world if I can't find this information, but it feels incomplete to me if I don't fully describe my statistical instruments in the methodology chapter.
From the matplotlib.pyplot api documentation of boxplot. boxplot has a whis parameter that specifies the range for the whiskers. With a dedault value of 1.5.
whis : float, sequence, or string (default = 1.5)
As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whisIQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whisIQR. Beyond the whiskers, data are considered outliers and are plotted as individual points. Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.
The default of the range of the whiskers thus is 1.5* the interquartile range. In practice that means that any value lower then Q1 - 1.5* the interquartile range and any value higher then Q3 + 1.5* the interquartile range will be considered an outlier when using the default value.
Given a non default value the output will be adjusted for that value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With