From https://en.wikipedia.org/wiki/Box_plot
The whisker of the box plot has the following possible definitions:
I am wondering in the pandas:
df['data'].plot(kind = 'box', sym='bD')
which definition is the whisker using?
Also, for the matplotlib library:
ax.boxplot(dfa.duration)
which definition is the whisker using?
Thanks!
The boxplot documentaton says about the whiskers
whis: float, sequence, or string (default = 1.5)As a float, determines the reach of the whiskers to the beyond the first and third quartiles. In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whisIQR). Similarly, the lower whisker will extend to the first datum greater than Q1 - whisIQR. Beyond the whiskers, data are considered outliers and are plotted as individual points. Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.
The only definition from the list from the question which cannot be easily implemented is the "one standard deviation", all others are readily set with this argument. The default is the 1.5IQR definition.
The pandas.DataFrame.boxplot calls the matplotlib function. Hence they should be identical.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With