Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining values used in boxplot, using python and matplotlib

Tags:

I can draw a boxplot from data:

import numpy as np import matplotlib.pyplot as plt  data = np.random.rand(100) plt.boxplot(data) 

Then, the box will range from the 25th-percentile to 75th-percentile, and the whisker will range from the smallest value to the largest value between (25th-percentile - 1.5*IQR, 75th-percentile + 1.5*IQR), where the IQR denotes the inter-quartile range. (Of course, the value 1.5 is customizable).

Now I want to know the values used in the boxplot, i.e. the median, upper and lower quartile, the upper whisker end point and the lower whisker end point. While the former three are easy to obtain by using np.median() and np.percentile(), the end point of the whiskers will require some verbose coding:

median = np.median(data) upper_quartile = np.percentile(data, 75) lower_quartile = np.percentile(data, 25)  iqr = upper_quartile - lower_quartile upper_whisker = data[data<=upper_quartile+1.5*iqr].max() lower_whisker = data[data>=lower_quartile-1.5*iqr].min() 

I was wondering, while this is acceptable, would there be a neater way to do this? It seems that the values should be ready to pull-out from the boxplot, as it's already drawn.

like image 604
Yuxiang Wang Avatar asked May 04 '14 21:05

Yuxiang Wang


1 Answers

Why do you want to do so? what you are doing is already pretty direct.

Yeah, if you want to fetch them for the plot, when the plot is already made, simply use the get_ydata() method.

B = plt.boxplot(data) [item.get_ydata() for item in B['whiskers']] 

It returns an array of the shape (2,) for each whiskers, the second element is the value we want:

[item.get_ydata()[1] for item in B['whiskers']] 
like image 194
CT Zhu Avatar answered Oct 08 '22 12:10

CT Zhu