Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to base seaborn boxplot whiskers on percentiles?

I am using a boxplot to show differences in the distribution of values between groups. The lower(25) and higher (75) percentiles and the median are indicative of the distribution and main differences between groups. The whiskers are however less clear. By default in matlibplot or seaborn, the whiskers of a boxplot are a representation of a multiple (default: 1.5) of the innerquartile range (IQR), which is the range of values covered by the inner box. Points outside this range will be identified as outliers. Both seaborn and matlibplot have the same command to change the location of the whiskers:

whis : float, 
Proportion of the IQR past the low and high quartiles to extend the plot whiskers.Points outside this range will be identified as outliers.

For example:

boxplots = ax.boxplot(myData, whis=1.5)

Alternatively, matlibplot also allows to base the whiskers on percentiles. This works better for the story I am trying to tell with my data. For example:

boxplots = ax.boxplot(myData, whis=[5, 95])

In contrast to matlibplot, the whis=[5, 95] does not work in Seaborn. Now I am looking for way to define the whiskers in Seaborn based on percentiles.

My first idea was to get the values of the whiskers from matlibplot based on percentiles and find the corresponding proportional IQR whisker value. This is what I did:

for w in np.arange(0.00,2.00, 0.01):    
        fig, ax = plt.subplots(ncols=2, nrows=1,figsize=(8, 6))
        bp = ax[0].boxplot(myData, whis=[5, 95])
        ax[0].set_xlabel('bp')
        ap = ax[1].boxplot(myData, whis=w)
        ax[1].set_xlabel('ap')

        r = 3

        alo =  (np.round(bp['whiskers'][0].get_ydata(), r))
        blo =  (np.round(ap['whiskers'][0].get_ydata(), r))
        ahi =  (np.round(bp['whiskers'][1].get_ydata(), r))
        bhi =  (np.round(ap['whiskers'][1].get_ydata(), r))

        plt.close()

        if [alo == blo] == [True, True]:
            if [ahi == bhi] == [True, True]:
                print w, "|", alo[1], "=", blo[1], '&', ahi[1], "=", bhi[1]

The problem however is that this only works for perfectly normal distribution which my data does not meet. So I would really like find a way to use percentiles for the whiskers in Seaborn boxplots. Is there any way to do this?

like image 479
Hauzero Avatar asked May 27 '18 16:05

Hauzero


People also ask

What is the 75th percentile on a box plot?

The bottom of the box is the first quartile (25th percentile) and the top of the box is the third quartile (75th percentile). The line in the middle of the box is the median (50th percentile). The lines, also known as whiskers, extend to the lowest and highest values that are not outliers.

Can we customize the whiskers in a boxplot?

However, boxplot() can only set cap values of whiskers as the values of percentiles. e.g. Given my distribution is not a normal distribution, then the 95th/5th percentiles will not be the (mean+2std)/(mean-2std).

Does a box plot shows the 10th and 90th percentiles?

A box plot graphically shows the 10th and 90th percentiles. A box plot shows the first and third quartiles. If a distribution is negatively skewed, the distribution is not symmetric and the values extend much farther to the left of the peak than to the right of the peak.

What is whis in SNS boxplot?

whis : float, Proportion of the IQR past the low and high quartiles to extend the plot whiskers. Points outside this range will be identified as outliers. For example: boxplots = ax.boxplot(myData, whis=1.5)


1 Answers

Seaborn seems to work the same as matplotlib in this regard:

tips = sns.load_dataset("tips")
ax = sns.boxplot(x=tips["total_bill"], whis=[5, 95])
plt.grid(True)

Seaborn boxplot

plt.boxplot(tips["total_bill"], whis=[5, 95], vert=False)
plt.grid(True)

enter image description here

I guess seaborn just pass whis to the matplotlib method. The docstring might have been copied from an earlier version of matplotlib.

like image 63
Stop harming Monica Avatar answered Oct 31 '22 09:10

Stop harming Monica