Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping boxplots in seaborn when input is a DataFrame

I intend to plot multiple columns in a pandas dataframe, all grouped by another column using groupby inside seaborn.boxplot. There is a nice answer here, for a similar problem in matplotlib matplotlib: Group boxplots but given the fact that seaborn.boxplot comes with groupby option I thought it could be much easier to do this in seaborn.

Here we go with a reproducible example that fails:

import seaborn as sns
import pandas as pd
df = pd.DataFrame([[2, 4, 5, 6, 1], [4, 5, 6, 7, 2], [5, 4, 5, 5, 1],
                   [10, 4, 7, 8, 2], [9, 3, 4, 6, 2], [3, 3, 4, 4, 1]],
                  columns=['a1', 'a2', 'a3', 'a4', 'b'])

# display(df)
   a1  a2  a3  a4  b
0   2   4   5   6  1
1   4   5   6   7  2
2   5   4   5   5  1
3  10   4   7   8  2
4   9   3   4   6  2
5   3   3   4   4  1

#Plotting by seaborn
sns.boxplot(df[['a1','a2', 'a3', 'a4']], groupby=df.b)

What I get is something that completely ignores groupby option:

Failed groupby

Whereas if I do this with one column it works thanks to another SO question Seaborn groupby pandas Series :

sns.boxplot(df.a1, groupby=df.b)

seaborn that does not fail

So I would like to get all my columns in one plot (all columns come in a similar scale).

EDIT:

The above SO question was edited and now includes a 'not clean' answer to this problem, but it would be nice if someone has a better idea for this problem.

like image 896
Arman Avatar asked Aug 13 '14 11:08

Arman


People also ask

What is Orient in Seaborn?

orient“v” | “h”, optional. Orientation of the plot (vertical or horizontal). This is usually inferred based on the type of the input variables, but it can be used to resolve ambiguity when both x and y are numeric or when plotting wide-form data. colormatplotlib color, optional.

How shall you make a box plot using Seaborn?

1. Draw a single horizontal box plot using only one axis: If we use only one data variable instead of two data variables then it means that the axis denotes each of these data variables as an axis. X denotes an x-axis and y denote a y-axis.

What is whis in SNS boxplot?

whis : float, Proportion of the IQR past the low and high quartiles to extend the plot whiskers. Points outside this range will be identified as outliers. For example: boxplots = ax.boxplot(myData, whis=1.5)


4 Answers

As the other answers note, the boxplot function is limited to plotting a single "layer" of boxplots, and the groupby parameter only has an effect when the input is a Series and you have a second variable you want to use to bin the observations into each box..

However, you can accomplish what I think you're hoping for with the factorplot function, using kind="box". But, you'll first have to "melt" the sample dataframe into what is called long-form or "tidy" format where each column is a variable and each row is an observation:

df_long = pd.melt(df, "b", var_name="a", value_name="c")

Then it's very simple to plot:

sns.factorplot("a", hue="b", y="c", data=df_long, kind="box")

enter image description here

like image 87
mwaskom Avatar answered Sep 29 '22 20:09

mwaskom


You can use directly boxplot (I imagine when the question was asked, that was not possible, but with seaborn version > 0.6 it is).

As explained by @mwaskom, you have to "melt" the sample dataframe into its "long-form" where each column is a variable and each row is an observation:

df_long = pd.melt(df, "b", var_name="a", value_name="c")

# display(df_long.head())
   b   a   c
0  1  a1   2
1  2  a1   4
2  1  a1   5
3  2  a1  10
4  2  a1   9

Then you just plot it:

sns.boxplot(x="a", hue="b", y="c", data=df_long)

plot obtained with boxplot

like image 36
MrT77 Avatar answered Sep 29 '22 20:09

MrT77


Seaborn's groupby function takes Series not DataFrames, that's why it's not working.

As a work around, you can do this :

fig, ax = plt.subplots(1,2, sharey=True)
for i, grp in enumerate(df.filter(regex="a").groupby(by=df.b)):
    sns.boxplot(grp[1], ax=ax[i])

it gives : sns

Note that df.filter(regex="a") is equivalent to df[['a1','a2', 'a3', 'a4']]

   a1  a2  a3  a4
0   2   4   5   6
1   4   5   6   7
2   5   4   5   5
3  10   4   7   8
4   9   3   4   6
5   3   3   4   4

Hope this helps

like image 7
jrjc Avatar answered Sep 29 '22 19:09

jrjc


It isn't really any better than the answer you linked, but I think the way to achieve this in seaborn is using the FacetGrid feature, as the groupby parameter is only defined for Series passed to the boxplot function.

Here's some code - the pd.melt is necessary because (as best I can tell) the facet mapping can only take individual columns as parameters, so the data need to be turned into a 'long' format.

g = sns.FacetGrid(pd.melt(df, id_vars='b'), col='b')
g.map(sns.boxplot, 'value', 'variable')

faceted seaborn boxplot

like image 5
chrisb Avatar answered Sep 29 '22 19:09

chrisb