Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one insert statistical annotations (stars or p-values) into matplotlib / seaborn plots?

This seems like a trivial question, but I've been searching for a while and can't seem to find an answer. It also seems like something that should be a standard part of these packages. Does anyone know if there is a standard way to include statistical annotation between distribution plots in seaborn?

For example, between two box or swarmplots?

Example: the yellow distribution is significantly different than the others (by wilcoxon - how can i display that visually?

like image 946
cancerconnector Avatar asked Apr 12 '16 15:04

cancerconnector


People also ask

Can you use seaborn with Matplotlib?

Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas DataFrame s.

How do you plot seaborn?

Explanation: This is the one kind of scatter plot of categorical data with the help of seaborn. Categorical data is represented on the x-axis and values correspond to them represented through the y-axis. . striplot() function is used to define the type of the plot and to plot them on canvas using.

How do seaborn and Matplotlib work together?

Seaborn is more comfortable in handling Pandas data frames. It uses basic sets of methods to provide beautiful graphics in python. Matplotlib works efficiently with data frames and arrays.It treats figures and axes as objects. It contains various stateful APIs for plotting.


2 Answers

Here how to add statistical annotation to a Seaborn box plot:

import seaborn as sns, matplotlib.pyplot as plt  tips = sns.load_dataset("tips") sns.boxplot(x="day", y="total_bill", data=tips, palette="PRGn")  # statistical annotation x1, x2 = 2, 3   # columns 'Sat' and 'Sun' (first column: 0, see plt.xticks()) y, h, col = tips['total_bill'].max() + 2, 2, 'k' plt.plot([x1, x1, x2, x2], [y, y+h, y+h, y], lw=1.5, c=col) plt.text((x1+x2)*.5, y+h, "ns", ha='center', va='bottom', color=col)  plt.show() 

And here the result: box plot annotated

like image 50
Ulrich Stern Avatar answered Sep 21 '22 12:09

Ulrich Stern


One may also be interested in adding several annotations to different pairs of boxes. In such a case, it might be useful to handle the placement of the different lines and texts in the y-axis automatically. I and other contributors wrote a small function to handle these cases (see Github repo), which correctly stacks the lines one on top of each other without overlapping. Annotations can be either inside or outside the plot, and several statistical tests are implemented: Mann-Whitney and t-test (independent and paired). Here is one minimal example.

import matplotlib.pyplot as plt import seaborn as sns from statannot import add_stat_annotation  sns.set(style="whitegrid") df = sns.load_dataset("tips")  x = "day" y = "total_bill" order = ['Sun', 'Thur', 'Fri', 'Sat'] ax = sns.boxplot(data=df, x=x, y=y, order=order) add_stat_annotation(ax, data=df, x=x, y=y, order=order,                     box_pairs=[("Thur", "Fri"), ("Thur", "Sat"), ("Fri", "Sun")],                     test='Mann-Whitney', text_format='star', loc='outside', verbose=2) 

example1

x = "day" y = "total_bill" hue = "smoker" ax = sns.boxplot(data=df, x=x, y=y, hue=hue) add_stat_annotation(ax, data=df, x=x, y=y, hue=hue,                     box_pairs=[(("Thur", "No"), ("Fri", "No")),                                  (("Sat", "Yes"), ("Sat", "No")),                                  (("Sun", "No"), ("Thur", "Yes"))                                 ],                     test='t-test_ind', text_format='full', loc='inside', verbose=2) plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1)) 

example2

like image 41
fokkerplanck Avatar answered Sep 18 '22 12:09

fokkerplanck