Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to plot two violin plot series on the same graph using seaborn?

Looking at the documentation about violon plots with seaborn, I would like to know how to plot two series of violin plots on the same axis (point 1) and that they are comparable (point 2).

About point 1, I would like to reproduce that plot for each sex :

fig, ax = plt.subplots()
sns.violinplot(x="day", y="total_bill", hue="smoker",
                    data=tips, split=True, ax=ax)

I could do it on two subplots:

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(211)
sns.violinplot(x="day", y="total_bill", hue="smoker",
               data=tips[tips.sex == "Female"], split=True, ax=ax)

ax = fig.add_subplot(212)
sns.violinplot(x="day", y="total_bill", hue="smoker",
               data=tips[tips.sex == "Male"], split=True, ax=ax)

I would like to plot the two violin plot series on the same matplotlib axes.

Another point is about the width of the violin plot. I do not clearly understand if the violins are normalized or not and how ? I assume that the width is computed for each plot. On the example above the width is computed for Female for the first subplot and for Male for the second subplot. Thus can I compare directly the densities ? I suppose that I can compare the shapes but, for example, I cannot compare the amount of Male smoker on Monday and Female ones ? Is there a way to manage the normalization of the violins ?

like image 860
Ger Avatar asked Nov 21 '17 09:11

Ger


People also ask

Are violin plots better or worse than box plots?

A violin plot is more informative than a plain box plot. While a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data. The difference is particularly useful when the data distribution is multimodal (more than one peak).

Can you make subplots using Seaborn?

In this article, we will explore how to create a subplot or multi-dimensional plot in seaborn, It is a useful approach to draw subplot instances of the same plot on different subsets of your dataset. It allows a viewer to quickly extract a large amount of data about complex information.

What is Violinplot in Seaborn?

A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared.


2 Answers

For you first point, there is no way of doing this in Seaborn. Check out my comments for possible workarounds, but in brief, I don't think the time spent would be worthwhile.

For you second question, the scale and scale_hue parameters to violinplot controls how the violin patch is normalized/scaled:

scale : {“area”, “count”, “width”}, optional

The method used to scale the width of each violin. If area, each violin will have the same area. If count, the width of the violins will be scaled by the number of observations in that bin. If width, each violin will have the same width.

scale_hue : bool, optional

When nesting violins using a hue variable, this parameter determines whether the scaling is computed within each level of the major grouping variable (scale_hue=True) or across all the violins on the plot (scale_hue=False).

The defaults are 'area' and False. You can see below how varying these parameters affects the violins. For example, if you want to compare between plots and represent the absolute counts truthfully, you could set scale='count' and scale_hue=False. Note that the violins would still be scaled to the maximum count within the plot (and not in the dataset), so in our case, the biggest violin for females would represent ~40 observations while the biggest violin for males would represent ~25 observations.

fig, axes = plt.subplots(4, 2, figsize=(10, 16), sharey='row')
axes_cols = (axes.flatten()[::2], axes.flatten()[1::2])

for (sex_name, sex), axes_col in zip(tips.groupby('sex'), axes_cols):
    sns.countplot(x="day", hue="smoker", data=sex, ax=axes_col[0])
    for scale, ax in zip(['area', 'count', 'width'], axes_col[1:]):
        sns.violinplot(x="day", y="total_bill", hue="smoker",
            data=sex, split=True, ax=ax, scale=scale)
        ax.set_title('scale = {}'.format(scale), y=0.95)
sns.despine()
fig.tight_layout()

enter image description here

Adding scale_hue=False: enter image description here

like image 180
joelostblom Avatar answered Oct 21 '22 01:10

joelostblom


As said Joel Ostblom in ist answer, there is no way to plot several violin plots natively in seaborn. I suggest here a solution following his hints. I also add a count plots on the top, as he did in his answer.

I do not really know the tips data set, thus the plot below is certainly meaningless.

import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid", font_scale=1.5)

tips = sns.load_dataset("tips")
# here you add a new column with the two categorical data you want
tips["sex_time"] = tips[["sex", "time"]].apply(lambda x: "_".join(x), axis=1)

fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(8, 10), 
                         sharex=True, gridspec_kw=dict(height_ratios=(1, 3), hspace=0))

# select the order you want:
order=["Female_Lunch", "Male_Lunch", "Female_Dinner", "Male_Dinner"]

sns.countplot(
    data=tips, x="sex_time", hue="smoker", 
    order=order,
    ax=axes[0]
)

sns.violinplot(
    x="sex_time", y="total_bill", hue="smoker", data=tips, 
    split=True, scale="count", scale_hue=False, inner="stick",
    order=order,
    ax=axes[1]
)
axes[1].set_xticklabels(["Lunch (Female)", "Lunch (Male)", "Dinner (Female)", "Dinner (Male)"])
axes[1].set_xlabel("Time (Sex)")
axes[1].legend("")

enter image description here

like image 44
Ger Avatar answered Oct 21 '22 01:10

Ger