I have a series of box plots I am trying to make, each of which has a different range. I tried setting ylim by determining the max and min of each separate series. However, the min in many cases is an outlier, and so the plot is compressed. How can I select the same limit used by the 'whiskers' of the plot (plus a small margin)?
Eg, right now I'm doing this:
[In]
ax = df['feature'].boxplot()
ymax = max(df['feature']
ymin = min(df['feature']
ax.set_ylim([ymax,ymin])
I'd like to set ymax, ymin to be the whiskers of the box plot.
To change the limit of axes, we use the ylim() function with keyword arguments bottom and top and set their values. Here we set the bottom value as -150 and the top value as 150. To plot the line graph, we use the plot() function.
In Matplotlib, we can draw multiple graphs in a single plot in two ways. One is by using subplot() function and other by superimposition of second graph on the first i.e, all graphs will appear on the same plot.
As an alternative to what @unutbu suggested, you could avoid plotting the outliers and then use ax.margins(y=0)
(or some small eps
) to scale the limits to the range of the whiskers.
For example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.poisson(5, size=(100, 5)))
fig, ax = plt.subplots()
#Note showfliers=False is more readable, but requires a recent version iirc
box = df.boxplot(ax=ax, sym='')
ax.margins(y=0)
plt.show()
And if you'd like a bit of room around the largest "whiskers", use ax.margins(0.05)
to add 5% of the range instead of 0% of the range:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.poisson(5, size=(100, 5)))
fig, ax = plt.subplots()
box = df.boxplot(ax=ax, sym='')
ax.margins(y=0.05)
plt.show()
You could set showfliers=False
in the boxplot, so the outliers don't get plotted.
Since you ask specifically about the whiskers, this is how they are calculated, with a default of 1.5:
whis : float, sequence (default = 1.5) or string
As a float, determines the reach of the whiskers past the first and third quartiles (e.g., Q3 + whis*IQR, IQR = interquartile range, Q3-Q1). Beyond the whiskers, data are considered outliers and are plotted as individual points. Set this to an unreasonably high value to force the whiskers to show the min and max values. Alternatively, set this to an ascending sequence of percentile (e.g., [5, 95]) to set the whiskers at specific percentiles of the data. Finally, whis can be the string ‘range’ to force the whiskers to the min and max of the data. In the edge case that the 25th and 75th percentiles are equivalent, whis will be automatically set to ‘range’.
You could do the same calculation and set your ylim
to that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With