I use a bar graph to indicate the data of each group. Some of these bars differ significantly from each other. How can I indicate the significant difference in the bar plot?
import numpy as np
import matplotlib.pyplot as plt
menMeans = (5, 15, 30, 40)
menStd = (2, 3, 4, 5)
ind = np.arange(4) # the x locations for the groups
width=0.35
p1 = plt.bar(ind, menMeans, width=width, color='r', yerr=menStd)
plt.xticks(ind+width/2., ('A', 'B', 'C', 'D') )
I am aiming for
For example, you can easily highlight specific points in a scatter plot, or you could add asterisks (“stars”, “*”) to a bar graph with a mouse click to denote statistical significance.
Start by looking at the left side of your degrees of freedom and find your variance. Then, go upward to see the p-values. Compare the p-value to the significance level or rather, the alpha. Remember that a p-value less than 0.05 is considered statistically significant.
I've done a couple of things here that I suggest when working with complex plots. Pull out the custom formatting into a dictionary, it makes life simple when you want to change a parameter - and you can pass this dictionary to multiple plots. I've also written a custom function to annotate
the itervalues, as a bonus it can annotate between (A,C)
if you really want to (I stand by my comment that this isn't the right visual approach however). It may need some tweaking once the data changes but this should put you on the right track.
import numpy as np
import matplotlib.pyplot as plt
menMeans = (5, 15, 30, 40)
menStd = (2, 3, 4, 5)
ind = np.arange(4) # the x locations for the groups
width= 0.7
labels = ('A', 'B', 'C', 'D')
# Pull the formatting out here
bar_kwargs = {'width':width,'color':'y','linewidth':2,'zorder':5}
err_kwargs = {'zorder':0,'fmt':None,'linewidth':2,'ecolor':'k'} #for matplotlib >= v1.4 use 'fmt':'none' instead
fig, ax = plt.subplots()
ax.p1 = plt.bar(ind, menMeans, **bar_kwargs)
ax.errs = plt.errorbar(ind, menMeans, yerr=menStd, **err_kwargs)
# Custom function to draw the diff bars
def label_diff(i,j,text,X,Y):
x = (X[i]+X[j])/2
y = 1.1*max(Y[i], Y[j])
dx = abs(X[i]-X[j])
props = {'connectionstyle':'bar','arrowstyle':'-',\
'shrinkA':20,'shrinkB':20,'linewidth':2}
ax.annotate(text, xy=(X[i],y+7), zorder=10)
ax.annotate('', xy=(X[i],y), xytext=(X[j],y), arrowprops=props)
# Call the function
label_diff(0,1,'p=0.0370',ind,menMeans)
label_diff(1,2,'p<0.0001',ind,menMeans)
label_diff(2,3,'p=0.0025',ind,menMeans)
plt.ylim(ymax=60)
plt.xticks(ind, labels, color='k')
plt.show()
The answer above inspired me to write a small but flexible function myself:
def barplot_annotate_brackets(num1, num2, data, center, height, yerr=None, dh=.05, barh=.05, fs=None, maxasterix=None):
"""
Annotate barplot with p-values.
:param num1: number of left bar to put bracket over
:param num2: number of right bar to put bracket over
:param data: string to write or number for generating asterixes
:param center: centers of all bars (like plt.bar() input)
:param height: heights of all bars (like plt.bar() input)
:param yerr: yerrs of all bars (like plt.bar() input)
:param dh: height offset over bar / bar + yerr in axes coordinates (0 to 1)
:param barh: bar height in axes coordinates (0 to 1)
:param fs: font size
:param maxasterix: maximum number of asterixes to write (for very small p-values)
"""
if type(data) is str:
text = data
else:
# * is p < 0.05
# ** is p < 0.005
# *** is p < 0.0005
# etc.
text = ''
p = .05
while data < p:
text += '*'
p /= 10.
if maxasterix and len(text) == maxasterix:
break
if len(text) == 0:
text = 'n. s.'
lx, ly = center[num1], height[num1]
rx, ry = center[num2], height[num2]
if yerr:
ly += yerr[num1]
ry += yerr[num2]
ax_y0, ax_y1 = plt.gca().get_ylim()
dh *= (ax_y1 - ax_y0)
barh *= (ax_y1 - ax_y0)
y = max(ly, ry) + dh
barx = [lx, lx, rx, rx]
bary = [y, y+barh, y+barh, y]
mid = ((lx+rx)/2, y+barh)
plt.plot(barx, bary, c='black')
kwargs = dict(ha='center', va='bottom')
if fs is not None:
kwargs['fontsize'] = fs
plt.text(*mid, text, **kwargs)
which allows me to get some nice annotations relatively simple, e.g.:
heights = [1.8, 2, 3]
bars = np.arange(len(heights))
plt.figure()
plt.bar(bars, heights, align='center')
plt.ylim(0, 5)
barplot_annotate_brackets(0, 1, .1, bars, heights)
barplot_annotate_brackets(1, 2, .001, bars, heights)
barplot_annotate_brackets(0, 2, 'p < 0.0075', bars, heights, dh=.2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With