Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hard coding confidence interval as whiskers in bar plot

So I calculated the confident interval for a set of data with a normal distribution and I want to plot it as whiskers on the bar chart of the data mean. I tried using yerr parameter for the plt.bar but it calculates the standard deviation error not the confident interval.I want the same whiskers visualizations on the bar plot. The confident intervals I have are:

[(29600.87 , 39367.28 ), ( 37101.74 , 42849.60 ), ( 33661.12 , 41470.25 ), ( 46019.20 , 49577.80)]

Here's my code, I tried feeding the yerr parameters with the confident levels but did't work out so well.

means=[np.mean(df.iloc[x]) for x in range(len(df.index))]

CI=[st.t.interval(0.95, len(df.iloc[x])-1, loc=np.mean(df.iloc[x]), scale=st.sem(df.iloc[x])) for x in range(len(df.index))]

plt.figure()

plt.bar(x_axis, means, color='r',yerr=np.reshape(CI,(2,4))

plt.xticks(np.arange(1992,1996,1))

Here's the plot I'm getting:

enter image description here

like image 541
Majd Takidin Avatar asked Mar 21 '17 06:03

Majd Takidin


2 Answers

The following should do what you want (assuming that your errors are symmetric; if not then you should go with @ImportanceOfBeingErnest's answer); the plot would look like this:

enter image description here

The code that produces it with some inline comments:

import matplotlib.pyplot as plt

# rough estimates of your means; replace by your actual values
means = [34500, 40000, 37500, 47800]

# the confidence intervals you provided
ci = [(29600.87, 39367.28), (37101.74, 42849.60), (33661.12, 41470.25), (46019.20, 49577.80)]

# get the range of the confidence interval
y_r = [means[i] - ci[i][1] for i in range(len(ci))]
plt.bar(range(len(means)), means, yerr=y_r, alpha=0.2, align='center')
plt.xticks(range(len(means)), [str(year) for year in range(1992, 1996)])
plt.show()
like image 162
Cleb Avatar answered Oct 05 '22 13:10

Cleb


The yerr argument to barcan be used to draw the errors as errorbars. The errors are defined as the deviation from some value, i.e. often quantities are given in the form y ± err. This means the the confidence interval would be (y-err, y+err).
This can be inverted; given a confidence interval (a, b) and a value y, the errors would be y-a and b-y.

In a matplotlib bar plot the error format can be scalar | N, Nx1 or 2xN array-like. Since we cannot know beforehands if the y value lies symmetric in the interval and since it can be different for different realizations (bars), we need to choose the 2 x N-format here.

The code below shows how to do that.

import numpy as np
import matplotlib.pyplot as plt

# given some mean values and their confidence intervals,
means = np.array([30, 100, 60, 80])
conf  = np.array([[24, 35],[90, 110], [52, 67], [71, 88]])

# calculate the error
yerr = np.c_[means-conf[:,0],conf[:,1]-means ].T
print (yerr) # prints [[ 6 10  8  9]
             #         [ 5 10  7  8]]

# and plot it on a bar chart
plt.bar(range(len(means)), means, yerr=yerr)
plt.xticks(range(len(means)))
plt.show()

enter image description here

like image 25
ImportanceOfBeingErnest Avatar answered Oct 05 '22 13:10

ImportanceOfBeingErnest