When using box plots in Python, is there any way to automatically/easily overlay the value of the median & variance on top of each box (or at least the numerical value of the median)?
E.g. in the boxplot below, I would like to overlay the text (median, +- std) on each box plot.
To compare two box plots with overlapping boxes and medians, calculate the Distance Between Medians as a percentage of the Overall Visible Spread. Keep in mind that box plots are about ranges, not the absolute counts of data. Their skewness suggests that the data might not assume a normal distribution.
Strictly, all you can read off a boxplot about the variability of a distribution are its interquartile range (the length or height of the box) and range (the length or height between the extremes of the display).
The median is the average value from a set of data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less.
The far left of the chart (at the end of the left “whisker”) is the minimum (the smallest number in the set) and the far right is the maximum (the largest number in the set). Finally, the median is represented by a vertical bar in the center of the box. Box plots aren't used that much in real life.
Assuming you are using the boxplot function to draw the boxplots, it returns a dictionary that holds the components of the graph. Note that the box represent the inner quartile range (25 to 75th percentile) and not the standard deviation.
>>> bp_dict = boxplot(data, vert=False) # draw horizontal boxplot
>>> bp_dict.keys()
>>> bp_dict.keys()
['medians', 'fliers', 'whiskers', 'boxes', 'caps']
These contain the Line2D objects that form each of the plot elements. You can use the Line2D.get_xydata method to get the median and box positions (in data coords) to figure out where to position your text.
from pylab import *
# from http://matplotlib.org/examples/pylab_examples/boxplot_demo.html
# fake up some data
spread= rand(50) * 100
center = ones(25) * 50
flier_high = rand(10) * 100 + 100
flier_low = rand(10) * -100
data =concatenate((spread, center, flier_high, flier_low), 0)
# fake up some more data
spread= rand(50) * 100
center = ones(25) * 40
flier_high = rand(10) * 100 + 100
flier_low = rand(10) * -100
d2 = concatenate( (spread, center, flier_high, flier_low), 0 )
data.shape = (-1, 1)
d2.shape = (-1, 1)
#data = concatenate( (data, d2), 1 )
# Making a 2-D array only works if all the columns are the
# same length. If they are not, then use a list instead.
# This is actually more efficient because boxplot converts
# a 2-D array into a list of vectors internally anyway.
data = [data, d2, d2[::2,0]]
# multiple box plots on one figure
figure()
# get dictionary returned from boxplot
bp_dict = boxplot(data, vert=False)
for line in bp_dict['medians']:
# get position data for median line
x, y = line.get_xydata()[1] # top of median line
# overlay median value
text(x, y, '%.1f' % x,
horizontalalignment='center') # draw above, centered
for line in bp_dict['boxes']:
x, y = line.get_xydata()[0] # bottom of left line
text(x,y, '%.1f' % x,
horizontalalignment='center', # centered
verticalalignment='top') # below
x, y = line.get_xydata()[3] # bottom of right line
text(x,y, '%.1f' % x,
horizontalalignment='center', # centered
verticalalignment='top') # below
show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With