Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting errorbar with mean and std after grouping

I have the following dataframe:

                    mean       std
insert quality                    
0.0    good     0.009905  0.003662
0.1    good     0.450190  0.281895
       poor     0.376818  0.306806
0.2    good     0.801856  0.243288
       poor     0.643859  0.322378
0.3    good     0.833235  0.172025
       poor     0.698972  0.263266
0.4    good     0.842288  0.141925
       poor     0.706708  0.241269
0.5    good     0.853634  0.118604
       poor     0.685716  0.208073
0.6    good     0.845496  0.118609
       poor     0.675907  0.207755
0.7    good     0.826335  0.133820
       poor     0.656934  0.222823
0.8    good     0.829707  0.130154
       poor     0.627111  0.213046
0.9    good     0.816636  0.137371
       poor     0.589331  0.232756
1.0    good     0.801211  0.147864
       poor     0.554589  0.245867

What should I do if wanted to plot 2 curves (points + errors) using as the X axis the index column "Insert" and differentiating the two curves by "Quality" [good, poor]? They should be of different colors too.

I'm kinda stuck, I produced every kind of plot apart the one I need.

like image 415
Marco Pietrosanto Avatar asked Jan 13 '16 13:01

Marco Pietrosanto


People also ask

Should I plot standard deviation or standard error?

When to use standard error? It depends. If the message you want to carry is about the spread and variability of the data, then standard deviation is the metric to use. If you are interested in the precision of the means or in comparing and testing differences between means then standard error is your metric.

Do error bars show standard deviation or standard error?

Error bars may show confidence intervals, standard errors, standard deviations, or other quantities.


1 Answers

You could loop through the groups in df.groupby('quality') and call group.plot on each group.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'insert': [0.0, 0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.4, 0.4, 0.5, 0.5, 0.6, 0.6,
    0.7, 0.7, 0.8, 0.8, 0.9, 0.9, 1.0, 1.0],
    'mean': [0.009905, 0.45019, 0.376818, 0.801856, 0.643859, 0.833235,
    0.698972, 0.842288, 0.706708, 0.853634, 0.685716, 0.845496, 0.675907,
    0.826335, 0.656934, 0.829707, 0.627111, 0.816636, 0.589331, 0.801211,
    0.554589],
    'quality': ['good', 'good', 'poor', 'good', 'poor', 'good', 'poor', 'good',
    'poor', 'good', 'poor', 'good', 'poor', 'good', 'poor', 'good', 'poor',
    'good', 'poor', 'good', 'poor'], 
    'std': [0.003662, 0.281895, 0.306806, 0.243288, 0.322378, 0.172025,
    0.263266, 0.141925, 0.241269, 0.118604, 0.208073, 0.118609, 0.207755,
    0.13382, 0.222823, 0.130154, 0.213046, 0.137371, 0.232756, 0.147864,
    0.245867]})

fig, ax = plt.subplots()    # 1

for key, group in df.groupby('quality'):
    group.plot('insert', 'mean', yerr='std', label=key, ax=ax)   # 2

plt.show()

enter image description here

To make both plots appear on the same axes:

  1. create your own axes object, ax.
  2. set the ax parameter to the axes object in each call to group.plot

It might look better as a bar plot:

# fill in missing data with 0, so the bar plots are aligned
df = df.pivot(index='insert', columns='quality').fillna(0).stack().reset_index()

colors = ['green', 'red']
positions = [0, 1]

for group, color, pos in zip(df.groupby('quality'), colors, positions):
    key, group = group
    print(group)
    group.plot('insert', 'mean', yerr='std', kind='bar', width=0.4, label=key, 
               position=pos, color=color, alpha=0.5, ax=ax)

ax.set_xlim(-1, 11)  
plt.show()

enter image description here

like image 181
unutbu Avatar answered Sep 27 '22 22:09

unutbu