Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error Bars with Seaborn and Stripplot

I've run into some difficulty adding error bars to my plots I've created in Python using Seaborn.

I currently have a data frame in a 'csv' format;

TSMdatabase = 'TSMvsRunmaster.csv';
tsmdf = pd.read_csv(TSMdatabase, sep=',');

The Dataframe has this heading format:

Run,TSMX_Value,TSMX_Error,TSMX+1_Value,TSMX+1_Error,Source

I then use a for loop to read in the different TSM values:

TSM = ['001', '002', '003', '004', '010', '011', '012', 
   '013', '016', '017', '101', '102', '104', '105', '106']

for x in TSM:
     tsm = x

And then finally I plot giving me:

plt.figure()
sns.set_style("darkgrid")
ax = sns.stripplot(x="Run", y='TSM'+str(tsm)+'_Value', hue="Source", data=tsmdf, 
                   jitter=True, palette="Set2", split=True)
plt.xticks(rotation=40)
plt.title('Run TSM'+str(tsm)+' Comparison')
plt.show()

Plot for certain TSM without Error Bars
Plot for certain TSM without Error Bars

If I then try to add error bars, I end up with just one Error Bar in the middle of each sub data set:

enter image description here

where each source, Python and Matlab actually has their own errors in the data frame!

Does anybody have any ideas! Thank you very much indeed!

like image 709
Toby Walsh Avatar asked Apr 01 '17 16:04

Toby Walsh


1 Answers

Plotting the mean + error is more suitable for sns.pointplot() than sns.stripplot(). This is indicated in the Seaborn documentation:

sns.pointplot Show point estimates and confidence intervals using scatter plot glyphs. A point plot represents an estimate of central tendency for a numeric variable by the position of scatter plot points and provides some indication of the uncertainty around that estimate using error bars.

sns.stripplot Draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.

If you have access to all observations, and not just the mean + error, what you want can be achieved simply by:

import seaborn as sns
%matplotlib inline

tips = sns.load_dataset('tips')
sns.pointplot('sex', 'tip', hue='smoker',
    data=tips, dodge=True, join=False)

enter image description here

You can change the type of confidence interval from the default 95% with the ci parameter:

sns.pointplot('sex', 'tip', hue='smoker',
    data=tips, dodge=True, join=False, ci='sd')

enter image description here

In the above, Seaborn calculated the measurements of error and central tendency. It is a little trickier if you already have these precalculated since there is currently not possible to use sns.pointplot() with a precalculated error bar. I add the errors using plt.errorbar() after plotting the means using sns.pointplot():

ax = sns.pointplot('sex', 'tip', hue='smoker',
    data=tips, dodge=True, join=False, ci=None)

# Find the x,y coordinates for each point
x_coords = []
y_coords = []
for point_pair in ax.collections:
    for x, y in point_pair.get_offsets():
        x_coords.append(x)
        y_coords.append(y)

# Calculate the type of error to plot as the error bars
# Make sure the order is the same as the points were looped over
errors = tips.groupby(['smoker', 'sex']).std()['tip']
colors = ['steelblue']*2 + ['coral']*2
ax.errorbar(x_coords, y_coords, yerr=errors,
    ecolor=colors, fmt=' ', zorder=-1)

enter image description here

You could also use matplotlib directly for the entire plot, If you supply the x-positions manually, similar to this example.

like image 97
joelostblom Avatar answered Sep 25 '22 06:09

joelostblom