Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a good way to display sample size on grouped boxplots using Python Matplotlib

I could get the size info using groupby and add text to the corresponding location. But I can't help thinking there's a better way as this really seems mundane, something many people would like to see...

To illustrate, the following code would generate a grouped boxplot

import pandas as pd
df = pd.DataFrame(rand(100, 1), columns=['value'])
df.ix[:23, 'class']='A'
df.ix[24:, 'class']='B'
df.boxplot(column='value', by='class')

boxplot What I'd like is to show the sample size of each class A and B, namely 24 and 76 respectively. It could appear as legend or somewhere near the boxes, either is ok with me.

Thanks!

like image 484
Tian He Avatar asked Mar 26 '15 18:03

Tian He


People also ask

Can you tell sample size from a box plot?

Sample size differences can be assessed by scaling the box plot width in proportion to √n (Fig. 1b), the factor by which the pre- cision of the sample's estimate of population statistics improves as sample size is increased. To assist in judging differences between sample medians, a notch (Fig.

How do I increase the size of a boxplot in MatPlotLib?

Steps. Set the figure size and adjust the padding between and around the subplots. Make a Pandas dataframe, i.e., two-dimensional, size-mutable, potentially heterogeneous tabular data. Make a box and whisker plot, using boxplot() method with width tuple to adjust the box in boxplot.

What is MatPlotLib PyLab?

MatPlotLib with Python PyLab is a procedural interface to the Matplotlib object-oriented plotting library. Matplotlib is the whole package; matplotlib. pyplot is a module in Matplotlib; and PyLab is a module that gets installed alongside Matplotlib. PyLab is a convenience module that bulk imports matplotlib.


1 Answers

n in the class ticklabels. I tried it as a legend but I didn't think it was as clear. R has a lot more boxplot options, including making the width of the boxes proportional to sample size; not a default in matplotlib but easy and seems really readable:

import pandas as pd
from numpy.random import rand, randint

df = pd.DataFrame(rand(100, 1), columns=['value'])

cut1 = randint(2,47)
cut2 = randint(52, 97)
df.ix[:cut1, 'class']='A'
df.ix[cut1+1:cut2, 'class']='B'
df.ix[cut2+1:, 'class'] = 'C'

dfg = df.groupby('class')

counts = [len(v) for k, v in dfg]
total = float(sum(counts))
cases = len(counts)

widths = [c/total for c in counts]  

cax = df.boxplot(column='value', by='class', widths=widths)
cax.set_xticklabels(['%s\n$n$=%d'%(k, len(v)) for k, v in dfg])

enter image description here

like image 96
cphlewis Avatar answered Oct 14 '22 16:10

cphlewis