Is there a good way to display sample size on grouped boxplots using Python Matplotlib

Tags:

I could get the size info using groupby and add text to the corresponding location. But I can't help thinking there's a better way as this really seems mundane, something many people would like to see...

To illustrate, the following code would generate a grouped boxplot

import pandas as pd
df = pd.DataFrame(rand(100, 1), columns=['value'])
df.ix[:23, 'class']='A'
df.ix[24:, 'class']='B'
df.boxplot(column='value', by='class')

boxplot What I'd like is to show the sample size of each class A and B, namely 24 and 76 respectively. It could appear as legend or somewhere near the boxes, either is ok with me.

Thanks!

484

asked Mar 26 '15 18:03

Tian He

1 Answers

n in the class ticklabels. I tried it as a legend but I didn't think it was as clear. R has a lot more boxplot options, including making the width of the boxes proportional to sample size; not a default in matplotlib but easy and seems really readable:

import pandas as pd
from numpy.random import rand, randint

df = pd.DataFrame(rand(100, 1), columns=['value'])

cut1 = randint(2,47)
cut2 = randint(52, 97)
df.ix[:cut1, 'class']='A'
df.ix[cut1+1:cut2, 'class']='B'
df.ix[cut2+1:, 'class'] = 'C'

dfg = df.groupby('class')

counts = [len(v) for k, v in dfg]
total = float(sum(counts))
cases = len(counts)

widths = [c/total for c in counts]  

cax = df.boxplot(column='value', by='class', widths=widths)
cax.set_xticklabels(['%s\n$n$=%d'%(k, len(v)) for k, v in dfg])

enter image description here

answered Oct 14 '22 16:10

cphlewis

Related questions
                            
                                Hive transform using Python: Unable to initialize custom script
                            
                                Implementing Chain of responsibility pattern in python using coroutines
                            
                                How to read constituency based parse tree
                            
                                What's the best way of distinguishing bools from numbers in Python?
                            
                                difference between readlines() and split() [python]
                            
                                python: How to calculate the cosine similarity of two word lists?
                            
                                How to change the text of a span that acts like a button
                            
                                Numpy reshape on view
                            
                                What could cause numpy.nanstd() to return nan?
                            
                                How to use nosetests in python while also passing/accepting arguments for argparse?
                            
                                Conditional replacement of multiple columns based on column values in pandas DataFrame
                            
                                Find the end offset of a matched string or regex
                            
                                Linear algebra on python
                            
                                pandas df.corr() returns NaN despite data fed having populated data
                            
                                Is it possible to plot implicit 3d equation using sympy?
                            
                                In Python, what is the easiest way to add a list consisting of keyword pairs to a dictionary?
                            
                                What did I forget in order to correctly send an email using Scrapy
                            
                                How to enable @cache_page for some of the Django Rest Framework views?
                            
                                Rotated axis labels are placed incorrectly (matplotlib)
                            
                                Python Saving and Editing with Klepto

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a good way to display sample size on grouped boxplots using Python Matplotlib

Tags:

python

matplotlib

boxplot

sample-size

Tian He

People also ask

1 Answers

cphlewis

Recent Activity

Donate For Us