Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting a swarmplot on a violinplot changes the ylim and truncates the violins

import seaborn as sns
import numpy as np  # for sample data
import pandas as pd

# sample data
np.random.seed(365)
rows = 60
data1 = {'Type 1': ['a'] * rows,
         'Total': np.random.normal(loc=25, scale=3, size=rows)}
data2 = {'Type 1': ['b'] * rows,
         'Total': np.random.normal(loc=60, scale=7, size=rows)}
df = pd.concat([pd.DataFrame(d) for d in [data1, data2]], ignore_index=True)

# plot
plt.figure(figsize=(5, 4))
sns.violinplot(x='Type 1', y= 'Total', data=df, inner=None)
sns.swarmplot(x='Type 1', y= 'Total', data=df, color='#000000', size=3)

enter image description here

compared to the plot without swarmplot

enter image description here

Displays out to the image above, how can I change the range displayed?

I've tried changing figsize. I didn't have this issue until I overlapped the swarmplot onto the violetplot.

df

  Type 1      Total
0      a  25.503763
1      a  26.570516
2      a  27.452127
3      a  30.111537
4      a  18.559157
...
115      b  67.389032
116      b  67.337122
117      b  59.193256
118      b  56.356515
119      b  57.353019
like image 800
Cullen Wise Avatar asked Oct 15 '25 02:10

Cullen Wise


1 Answers

  • When adding sns.swarmplot, or a sns.stripplot, to sns.violinplot, the limits of the y-axis are changed.
    • This occurs using both the explicit "Axes" interface, and implicit "pyplot" interface, as shown in this plot.
    • Using sns.catplot with kind='violin', and .map_dataframe with sns.swarmplot also produces the same issue, as shown in this plot.
    • This doesn’t occur if plotting sns.swarmplot on sns.boxplot, as shown in this plot.
  • Tested in python 3.11.2, matplotlib 3.7.1, seaborn 0.12.2
import seaborn as sns
import matplotlib.pyplot as plt

# sample data
df = sns.load_dataset('geyser')

# plot
sns.violinplot(data=df, x='kind', y='duration', inner=None)
print('ylim with 1 plot', plt.ylim())
sns.swarmplot(data=df, x='kind', y='duration', color='#000000', size=3)
print('ylim with both plots', plt.ylim())
ylim with 1 plot (1.079871611291212, 5.607761736565478)
ylim with both plots (1.425, 5.2749999999999995)

enter image description here

Resolution

  • Here are three options to resolve the issue:
    1. Capture the ylim values after plotting the sns.violinplot, and set ylim to those values after plotting the sns.swarmplot.
    2. Set ylim to some specific value after plotting sns.swarmplot
    3. Plot sns.swarmplot then sns.violinplot.
  • To have ylim start at the "origin", use y_bot = 0.
  • Using matplotlib.pyplot.ylim, matplotlib.axes.Axes.set_ylim, and matplotlib.axes.Axes.get_ylim.

1.

sns.violinplot(data=df, x='kind', y='duration', inner=None)
y_bot, y_top = plt.ylim()
sns.swarmplot(data=df, x='kind', y='duration', color='#000000', size=3)
plt.ylim(y_bot, y_top)

enter image description here

2.

sns.violinplot(data=df, x='kind', y='duration', inner=None)
sns.swarmplot(data=df, x='kind', y='duration', color='#000000', size=3)
plt.ylim(1, 6)

enter image description here

3.

# plot
sns.swarmplot(data=df, x='kind', y='duration', color='#000000', size=3)
print('ylim with 1 plot', plt.ylim())
sns.violinplot(data=df, x='kind', y='duration', inner=None)
print('ylim with both plots', plt.ylim())
ylim with 1 plot (1.425, 5.2749999999999995)
ylim with both plots (1.079871611291212, 5.607761736565478)

enter image description here

Preferentially, use the explicit interface

  • Why be explicit?

plt.figure and .add_subplot

fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot()
sns.violinplot(data=df, x='kind', y='duration', inner=None, ax=ax)
y_bot, y_top = ax.get_ylim()
sns.swarmplot(data=df, x='kind', y='duration', color='#000000', size=3, ax=ax)
ax.set_ylim(y_bot, y_top)

plt.subplots

fig, axes = plt.subplots(figsize=(8, 5))
sns.violinplot(data=df, x='kind', y='duration', inner=None, ax=ax)
y_bot, y_top = ax.get_ylim()
sns.swarmplot(data=df, x='kind', y='duration', color='#000000', size=3, ax=ax)
ax.set_ylim(y_bot, y_top)

df[['duration', 'kind']].head()

  • This real data is similar to the random sample in the OP.
   duration   kind
0     3.600   long
1     1.800  short
2     3.333   long
3     2.283  short
4     4.533   long
like image 169
Trenton McKinney Avatar answered Oct 17 '25 17:10

Trenton McKinney