I am making standard Matplotlib boxplots using the plt.boxplot() command. My line of code that creates the boxplot is:
bp = plt.boxplot(data, whis=[5, 95], showfliers=True)
Because my data has a large distribution, I am getting a lot of fliers outside the range of the whiskers. To get a cleaner publication quality plot, I would like to only show single fliers at the max. and at the min. values of the data, instead of all fliers. Is this possible? I don't see any built-in options in the documentation to do this.
(I can set the range of the whiskers to max/min, but this is not what I want. I would like to keep the whiskers at the 5th and 95th percentiles).
Below is the figure I am working on. Notice the density of fliers.
plt.boxplot()
returns a dictionary, where the key fliers
contains the upper and lower fliers as line2d objects. You can manipulate them before plotting like this:
Only on matplotlib >= 1.4.0
bp = plt.boxplot(data, whis=[5, 95], showfliers=True)
# Get a list of Line2D objects, representing a single line from the
# minimum to the maximum flier points.
fliers = bp['fliers']
# Iterate over it!
for fly in fliers:
fdata = fly.get_data()
fly.set_data([fdata[0][0],fdata[0][-1]],[fdata[1][0],fdata[1][-1]])
On older versions
If you are on an older version of matplotlib, the fliers for each boxplot are represented by two lines, not one. Thus, the loop would look something like this:
import numpy as np
for i in range(len(fliers)):
fdata = fliers[i].get_data()
# Get the index of the maximum y in data if
# i is 0 or even, else get index of minimum y.
if i%2 == 0:
id = np.where(fdata[1] == fdata[1].max())[0][0]
else:
id = np.where(fdata[1] == fdata[1].min())[0][0]
fliers[i].set_data([fdata[0][id], fdata[1][id]])
Also note that the showfliers
argument doesn't exist in matplotlib <1.4x and the whisk
argument doesn't accept lists.
Of course (for simple applications) you could plot the boxplot without fliers and add the max and min points to the plot:
bp = plt.boxplot(data, whis=[5, 95], showfliers=False)
sc = plt.scatter([1, 1], [data.min(), data.max()])
where [1, 1]
is the x-position of the points.
fliers = bp['fliers']
for i in range(len(fliers)): # iterate through the Line2D objects for the fliers for each boxplot
box = fliers[i] # this accesses the x and y vectors for the fliers for each box
box.set_data([[box.get_xdata()[0],box.get_xdata()[0]],[np.min(box.get_ydata()),np.max(box.get_ydata())]])
# note that you can use any two values from the xdata vector
Resulting figure, showing only max and min fliers:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With