Looking at the documentation about violon plots with seaborn, I would like to know how to plot two series of violin plots on the same axis (point 1) and that they are comparable (point 2).
About point 1, I would like to reproduce that plot for each sex :
fig, ax = plt.subplots()
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips, split=True, ax=ax)
I could do it on two subplots:
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(211)
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips[tips.sex == "Female"], split=True, ax=ax)
ax = fig.add_subplot(212)
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=tips[tips.sex == "Male"], split=True, ax=ax)
I would like to plot the two violin plot series on the same matplotlib axes.
Another point is about the width of the violin plot. I do not clearly understand if the violins are normalized or not and how ? I assume that the width is computed for each plot. On the example above the width is computed for Female for the first subplot and for Male for the second subplot. Thus can I compare directly the densities ? I suppose that I can compare the shapes but, for example, I cannot compare the amount of Male smoker on Monday and Female ones ? Is there a way to manage the normalization of the violins ?
A violin plot is more informative than a plain box plot. While a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data. The difference is particularly useful when the data distribution is multimodal (more than one peak).
In this article, we will explore how to create a subplot or multi-dimensional plot in seaborn, It is a useful approach to draw subplot instances of the same plot on different subsets of your dataset. It allows a viewer to quickly extract a large amount of data about complex information.
A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared.
For you first point, there is no way of doing this in Seaborn. Check out my comments for possible workarounds, but in brief, I don't think the time spent would be worthwhile.
For you second question, the scale
and scale_hue
parameters to violinplot
controls how the violin patch is normalized/scaled:
scale : {“area”, “count”, “width”}, optional
The method used to scale the width of each violin. If area, each violin will have the same area. If count, the width of the violins will be scaled by the number of observations in that bin. If width, each violin will have the same width.
scale_hue : bool, optional
When nesting violins using a hue variable, this parameter determines whether the scaling is computed within each level of the major grouping variable (scale_hue=True) or across all the violins on the plot (scale_hue=False).
The defaults are 'area'
and False
. You can see below how varying these parameters affects the violins. For example, if you want to compare between plots and represent the absolute counts truthfully, you could set scale='count'
and scale_hue=False
. Note that the violins would still be scaled to the maximum count within the plot (and not in the dataset), so in our case, the biggest violin for females would represent ~40 observations while the biggest violin for males would represent ~25 observations.
fig, axes = plt.subplots(4, 2, figsize=(10, 16), sharey='row')
axes_cols = (axes.flatten()[::2], axes.flatten()[1::2])
for (sex_name, sex), axes_col in zip(tips.groupby('sex'), axes_cols):
sns.countplot(x="day", hue="smoker", data=sex, ax=axes_col[0])
for scale, ax in zip(['area', 'count', 'width'], axes_col[1:]):
sns.violinplot(x="day", y="total_bill", hue="smoker",
data=sex, split=True, ax=ax, scale=scale)
ax.set_title('scale = {}'.format(scale), y=0.95)
sns.despine()
fig.tight_layout()
Adding scale_hue=False
:
As said Joel Ostblom in ist answer, there is no way to plot several violin plots natively in seaborn. I suggest here a solution following his hints. I also add a count plots on the top, as he did in his answer.
I do not really know the tips data set, thus the plot below is certainly meaningless.
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="whitegrid", font_scale=1.5)
tips = sns.load_dataset("tips")
# here you add a new column with the two categorical data you want
tips["sex_time"] = tips[["sex", "time"]].apply(lambda x: "_".join(x), axis=1)
fig, axes = plt.subplots(nrows=2, ncols=1, figsize=(8, 10),
sharex=True, gridspec_kw=dict(height_ratios=(1, 3), hspace=0))
# select the order you want:
order=["Female_Lunch", "Male_Lunch", "Female_Dinner", "Male_Dinner"]
sns.countplot(
data=tips, x="sex_time", hue="smoker",
order=order,
ax=axes[0]
)
sns.violinplot(
x="sex_time", y="total_bill", hue="smoker", data=tips,
split=True, scale="count", scale_hue=False, inner="stick",
order=order,
ax=axes[1]
)
axes[1].set_xticklabels(["Lunch (Female)", "Lunch (Male)", "Dinner (Female)", "Dinner (Male)"])
axes[1].set_xlabel("Time (Sex)")
axes[1].legend("")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With