I have a pandas data frame df that has four columns: Candidate, Sample_Set, Values, and Error. The Candidate column has, say, three unique entries: [X, Y, Z] and we have three sample sets, such that Sample_Set has three unique values as well: [1,2,3]. The df would roughly look like this.
import pandas as pd
data = {'Candidate': ['X', 'Y', 'Z', 'X', 'Y', 'Z', 'X', 'Y', 'Z'],
'Sample_Set': [1, 1, 1, 2, 2, 2, 3, 3, 3],
'Values': [20, 10, 10, 200, 101, 99, 1999, 998, 1003],
'Error': [5, 2, 3, 30, 30, 30, 10, 10, 10]}
df = pd.DataFrame(data)
# display(df)
Candidate Sample_Set Values Error
0 X 1 20 5
1 Y 1 10 2
2 Z 1 10 3
3 X 2 200 30
4 Y 2 101 30
5 Z 2 99 30
6 X 3 1999 10
7 Y 3 998 10
8 Z 3 1003 10
I am using seaborn to create a grouped barplot out of this with x="Candidate", y="Values", hue="Sample_Set". All's good, until I try to add an error bar along the y-axis using the values under the column named Error. I am using the following code.
import seaborn as sns
ax = sns.factorplot(x="Candidate", y="Values", hue="Sample_Set", data=df,
size=8, kind="bar")
How do I incorporate the error?
I would appreciate a solution or a more elegant approach on the task.
As @ResMar pointed out in the comments, there seems to be no built-in functionality in seaborn to easily set individual errorbars.
If you rather care about the result than the way to get there, the following (not so elegant) solution might be helpful, which builds on matplotlib.pyplot.bar. The seaborn import is just used to get the same style.
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
def grouped_barplot(df, cat,subcat, val , err):
u = df[cat].unique()
x = np.arange(len(u))
subx = df[subcat].unique()
offsets = (np.arange(len(subx))-np.arange(len(subx)).mean())/(len(subx)+1.)
width= np.diff(offsets).mean()
for i,gr in enumerate(subx):
dfg = df[df[subcat] == gr]
plt.bar(x+offsets[i], dfg[val].values, width=width,
label="{} {}".format(subcat, gr), yerr=dfg[err].values)
plt.xlabel(cat)
plt.ylabel(val)
plt.xticks(x, u)
plt.legend()
plt.show()
cat = "Candidate"
subcat = "Sample_Set"
val = "Values"
err = "Error"
# call the function with df from the question
grouped_barplot(df, cat, subcat, val, err )

Note that by simply inversing the category and subcategory
cat = "Sample_Set"
subcat = "Candidate"
you can get a different grouping:

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With