Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Side-by-side box plots on same figure

I am trying to generate a box plot in Python 2.7 for each categorical value in column E from the Pandas dataframe below

          A         B         C         D  E
0  0.647366  0.317832  0.875353  0.993592  1
1  0.504790  0.041806  0.113889  0.445370  2
2  0.769335  0.120647  0.749565  0.935732  3
3  0.215003  0.497402  0.795033  0.246890  1
4  0.841577  0.211128  0.248779  0.250432  1
5  0.045797  0.710889  0.257784  0.207661  4
6  0.229536  0.094308  0.464018  0.402725  3
7  0.067887  0.591637  0.949509  0.858394  2
8  0.827660  0.348025  0.507488  0.343006  3
9  0.559795  0.820231  0.461300  0.921024  1

I would be willing to do this with Matplotlib or any other plotting library. So far the above code can plot all the categories combined on one plot. Here is the code to generate the above data and produce the plot:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
fig, ax = plt.subplots()

# Data
df = pd.DataFrame(np.random.rand(10,4),columns=list('ABCD'))
df['E'] = [1,2,3,1,1,4,3,2,3,1]

# Boxplot
bp = ax.boxplot(df.iloc[:,:-1].values, widths=0.2)
plt.show()

In this example, the categories are 1,2,3,4. I would like to plot separate boxplots side-by-side on the same figure, for only categories 1 and 2 and show the category names in the legend.

Is there a way to do this?

Additional Information:

The output should look similar to the 3rd figure from here - replace "Yes","No" by "1","2".

like image 219
edesz Avatar asked May 12 '16 15:05

edesz


People also ask

How do you compare side by side box plots?

To compare box side by side box plots: Compare the location of the median to compare the averages of the data. Compare the lengths from whisker to whisker (the range), which is the spread of the data. Compare the lengths of the boxes (the interquartile range), which is the spread of the middle 50% of data.

Can Boxplots plot several data sets side by side?

Side-By-Side boxplots are used to display the distribution of several quantitative variables or a single quantitative variable along with a categorical variable.


1 Answers

An addition to @Paul_H answer.

Side-by-side boxplots on the single matplotlib.axes.Axes, no seaborn:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


df = pd.DataFrame(np.random.rand(10,4), columns=list('ABCD'))
df['E'] = [1, 2, 1, 1, 1, 2, 1, 2, 2, 1]

mask_e = df['E'] == 1

# prepare data
data_to_plot = [df[mask_e]['A'], df[~mask_e]['A'],
                df[mask_e]['B'], df[~mask_e]['B'],
                df[mask_e]['C'], df[~mask_e]['C'],
                df[mask_e]['D'], df[~mask_e]['D']]

# Positions defaults to range(1, N+1) where N is the number of boxplot to be drawn.
# we will move them a little, to visually group them
plt.figure(figsize=(10, 6))
box = plt.boxplot(data_to_plot,
                  positions=[1, 1.6, 2.5, 3.1, 4, 4.6, 5.5, 6.1],
                  labels=['A1','A0','B1','B0','C1','C0','D1','D0'])

result

like image 151
banderlog013 Avatar answered Sep 18 '22 10:09

banderlog013