I would like to annotate my violin plot with the number of observations in each group. So the question is essentially the same as this one, except:
Lets take this example from Seaborn API documentation:
import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)
I'd like to have n=62, n=19, n=87, and n=76 on top of the violins. Is this doable?
A violint plot allow to visualize the distribution of a numeric variable for one or several groups. Seaborn is particularly adapted to build it thanks to its violin() function. Violinplots deserve more attention compared to boxplots that can sometimes hide features of the data.
Violin plots are similar to box plots, except that they also show the probability density of the data at different values, usually smoothed by a kernel density estimator.
Violin plots are used when you want to observe the distribution of numeric data, and are especially useful when you want to make a comparison of distributions between multiple groups. The peaks, valleys, and tails of each group's density curve can be compared to see where groups are similar or different.
In order to create a violin plot, we just use the violinplot () function in Seaborn. We pass in the dataframe as well as the variables we want to visualize. We can pass in just the X variable and the function will automatically compute the values on the Y-axis: sns.violinplot (x=life_exp) plt.show ()
14 Data Visualization Plots of Seaborn 1 Matrix Plots. These are the special types of plots that use two-dimensional matrix data for visualization. ... 2 Grids. Grid plots provide us more control over visualizations and plots various assorted graphs with a single line of code. 3 Regression Plot. ...
Categorical scatterplots ¶. In general, the seaborn categorical plotting functions try to infer the order of categories from the data. If your data have a pandas Categorical datatype, then the default order of the categories can be set there. If the variable passed to the categorical axis looks numerical, the levels will be sorted.
Draw a violinplot with nested grouping by two categorical variables: Draw split violins to compare the across the hue variable: Control violin order by passing an explicit order: Scale the violin width by the number of observations in each bin: Draw the quartiles as horizontal lines instead of a mini-box:
In this situation, I like to precompute the annotated values and incorporate them into the categorical axis. In other words, precompute e.g., "Thurs, N = xxx"
That looks like this:
import seaborn as sns
sns.set_style("whitegrid")
ax= (
sns.load_dataset("tips")
.assign(count=lambda df: df['day'].map(df.groupby(by=['day'])['total_bill'].count()))
.assign(grouper=lambda df: df['day'].astype(str) + '\nN = ' + df['count'].astype(str))
.sort_values(by='day')
.pipe((sns.violinplot, 'data'), x="grouper", y="total_bill")
.set(xlabel='Day of the Week', ylabel='Total Bill (USD)')
)
You first need to store all values of y positions and x positions (using your dataset for that) in order to use ax.text
, then a simple for
loop can write everything in the positions desired:
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)
yposlist = tips.groupby(['day'])['total_bill'].median().tolist()
xposlist = range(len(yposlist))
stringlist = ['n = 62','n = 19','n = 87','n = 76']
for i in range(len(stringlist)):
ax.text(xposlist[i], yposlist[i], stringlist[i])
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With