Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matplotlib boxplot show only max and min fliers

I am making standard Matplotlib boxplots using the plt.boxplot() command. My line of code that creates the boxplot is:

bp = plt.boxplot(data, whis=[5, 95], showfliers=True)

Because my data has a large distribution, I am getting a lot of fliers outside the range of the whiskers. To get a cleaner publication quality plot, I would like to only show single fliers at the max. and at the min. values of the data, instead of all fliers. Is this possible? I don't see any built-in options in the documentation to do this.

(I can set the range of the whiskers to max/min, but this is not what I want. I would like to keep the whiskers at the 5th and 95th percentiles).

Below is the figure I am working on. Notice the density of fliers. Boxplots

like image 563
PJW Avatar asked Feb 15 '15 00:02

PJW


2 Answers

plt.boxplot() returns a dictionary, where the key fliers contains the upper and lower fliers as line2d objects. You can manipulate them before plotting like this:

Only on matplotlib >= 1.4.0

bp = plt.boxplot(data, whis=[5, 95], showfliers=True)

# Get a list of Line2D objects, representing a single line from the
# minimum to the maximum flier points.
fliers = bp['fliers']

# Iterate over it!
for fly in fliers:
    fdata = fly.get_data()
    fly.set_data([fdata[0][0],fdata[0][-1]],[fdata[1][0],fdata[1][-1]])

On older versions

If you are on an older version of matplotlib, the fliers for each boxplot are represented by two lines, not one. Thus, the loop would look something like this:

import numpy as np
for i in range(len(fliers)):
    fdata = fliers[i].get_data()
    # Get the index of the maximum y in data if 
    # i is 0 or even, else get index of minimum y.
    if i%2 == 0:
        id = np.where(fdata[1] == fdata[1].max())[0][0]
    else:
        id = np.where(fdata[1] == fdata[1].min())[0][0]
    fliers[i].set_data([fdata[0][id], fdata[1][id]])

Also note that the showfliers argument doesn't exist in matplotlib <1.4x and the whisk argument doesn't accept lists.

Of course (for simple applications) you could plot the boxplot without fliers and add the max and min points to the plot:

bp = plt.boxplot(data, whis=[5, 95], showfliers=False)
sc = plt.scatter([1, 1], [data.min(), data.max()])

where [1, 1] is the x-position of the points.

like image 78
Geotob Avatar answered Oct 10 '22 17:10

Geotob


fliers = bp['fliers'] 
for i in range(len(fliers)): # iterate through the Line2D objects for the fliers for each boxplot
    box = fliers[i] # this accesses the x and y vectors for the fliers for each box 
    box.set_data([[box.get_xdata()[0],box.get_xdata()[0]],[np.min(box.get_ydata()),‌​np.max(box.get_ydata())]]) 
    # note that you can use any two values from the xdata vector

Resulting figure, showing only max and min fliers: enter image description here

like image 44
PJW Avatar answered Oct 10 '22 16:10

PJW