Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overlaying the numeric value of median/variance in boxplots

When using box plots in Python, is there any way to automatically/easily overlay the value of the median & variance on top of each box (or at least the numerical value of the median)?

E.g. in the boxplot below, I would like to overlay the text (median, +- std) on each box plot.

                              enter image description here

like image 829
Josh Avatar asked Sep 17 '13 22:09

Josh


People also ask

How do you compare box plots with overlapping medians?

To compare two box plots with overlapping boxes and medians, calculate the Distance Between Medians as a percentage of the Overall Visible Spread. Keep in mind that box plots are about ranges, not the absolute counts of data. Their skewness suggests that the data might not assume a normal distribution.

Can you tell variance from Boxplot?

Strictly, all you can read off a boxplot about the variability of a distribution are its interquartile range (the length or height of the box) and range (the length or height between the extremes of the display).

What is the median value in a box plot?

The median is the average value from a set of data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less.

How do you find the median in box plots?

The far left of the chart (at the end of the left “whisker”) is the minimum (the smallest number in the set) and the far right is the maximum (the largest number in the set). Finally, the median is represented by a vertical bar in the center of the box. Box plots aren't used that much in real life.


1 Answers

Assuming you are using the boxplot function to draw the boxplots, it returns a dictionary that holds the components of the graph. Note that the box represent the inner quartile range (25 to 75th percentile) and not the standard deviation.

>>> bp_dict = boxplot(data, vert=False) # draw horizontal boxplot
>>> bp_dict.keys()
>>> bp_dict.keys()
['medians', 'fliers', 'whiskers', 'boxes', 'caps']

These contain the Line2D objects that form each of the plot elements. You can use the Line2D.get_xydata method to get the median and box positions (in data coords) to figure out where to position your text.

from pylab import *

# from http://matplotlib.org/examples/pylab_examples/boxplot_demo.html

# fake up some data
spread= rand(50) * 100
center = ones(25) * 50
flier_high = rand(10) * 100 + 100
flier_low = rand(10) * -100
data =concatenate((spread, center, flier_high, flier_low), 0)

# fake up some more data
spread= rand(50) * 100
center = ones(25) * 40
flier_high = rand(10) * 100 + 100
flier_low = rand(10) * -100
d2 = concatenate( (spread, center, flier_high, flier_low), 0 )
data.shape = (-1, 1)
d2.shape = (-1, 1)
#data = concatenate( (data, d2), 1 )
# Making a 2-D array only works if all the columns are the
# same length.  If they are not, then use a list instead.
# This is actually more efficient because boxplot converts
# a 2-D array into a list of vectors internally anyway.
data = [data, d2, d2[::2,0]]

# multiple box plots on one figure
figure()

# get dictionary returned from boxplot
bp_dict = boxplot(data, vert=False)

for line in bp_dict['medians']:
    # get position data for median line
    x, y = line.get_xydata()[1] # top of median line
    # overlay median value
    text(x, y, '%.1f' % x,
         horizontalalignment='center') # draw above, centered

for line in bp_dict['boxes']:
    x, y = line.get_xydata()[0] # bottom of left line
    text(x,y, '%.1f' % x,
         horizontalalignment='center', # centered
         verticalalignment='top')      # below
    x, y = line.get_xydata()[3] # bottom of right line
    text(x,y, '%.1f' % x,
         horizontalalignment='center', # centered
             verticalalignment='top')      # below

show()

boxplot output

like image 129
Greg Whittier Avatar answered Oct 09 '22 07:10

Greg Whittier