Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to annotate a seaborn violin plot with number of observations in each group?

I would like to annotate my violin plot with the number of observations in each group. So the question is essentially the same as this one, except:

  • python instead of R,
  • seaborn instead of ggplot, and
  • violin plots instead of boxplots

Lets take this example from Seaborn API documentation:

import seaborn as sns
sns.set_style("whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)

I'd like to have n=62, n=19, n=87, and n=76 on top of the violins. Is this doable?

like image 456
posdef Avatar asked Oct 16 '17 14:10

posdef


People also ask

What is Violinplot in Seaborn?

A violint plot allow to visualize the distribution of a numeric variable for one or several groups. Seaborn is particularly adapted to build it thanks to its violin() function. Violinplots deserve more attention compared to boxplots that can sometimes hide features of the data.

What do the violin plots reveal that box plots do not?

Violin plots are similar to box plots, except that they also show the probability density of the data at different values, usually smoothed by a kernel density estimator.

For which scenario violin plot is more appropriate?

Violin plots are used when you want to observe the distribution of numeric data, and are especially useful when you want to make a comparison of distributions between multiple groups. The peaks, valleys, and tails of each group's density curve can be compared to see where groups are similar or different.

How to create a violin plot in Seaborn?

In order to create a violin plot, we just use the violinplot () function in Seaborn. We pass in the dataframe as well as the variables we want to visualize. We can pass in just the X variable and the function will automatically compute the values on the Y-axis: sns.violinplot (x=life_exp) plt.show ()

What are the different types of data visualization in Seaborn?

14 Data Visualization Plots of Seaborn 1 Matrix Plots. These are the special types of plots that use two-dimensional matrix data for visualization. ... 2 Grids. Grid plots provide us more control over visualizations and plots various assorted graphs with a single line of code. 3 Regression Plot. ...

How do you plot categorical scatterplots in Seaborn?

Categorical scatterplots ¶. In general, the seaborn categorical plotting functions try to infer the order of categories from the data. If your data have a pandas Categorical datatype, then the default order of the categories can be set there. If the variable passed to the categorical axis looks numerical, the levels will be sorted.

How do I draw a violinplot with nested grouping by two variables?

Draw a violinplot with nested grouping by two categorical variables: Draw split violins to compare the across the hue variable: Control violin order by passing an explicit order: Scale the violin width by the number of observations in each bin: Draw the quartiles as horizontal lines instead of a mini-box:


2 Answers

In this situation, I like to precompute the annotated values and incorporate them into the categorical axis. In other words, precompute e.g., "Thurs, N = xxx"

That looks like this:

import seaborn as sns
sns.set_style("whitegrid")
ax= (
    sns.load_dataset("tips")
       .assign(count=lambda df: df['day'].map(df.groupby(by=['day'])['total_bill'].count()))
       .assign(grouper=lambda df: df['day'].astype(str) + '\nN = ' + df['count'].astype(str))
       .sort_values(by='day') 
       .pipe((sns.violinplot, 'data'), x="grouper", y="total_bill")
       .set(xlabel='Day of the Week', ylabel='Total Bill (USD)')   
)

enter image description here

like image 64
Paul H Avatar answered Oct 21 '22 09:10

Paul H


You first need to store all values of y positions and x positions (using your dataset for that) in order to use ax.text, then a simple for loop can write everything in the positions desired:

import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips)

yposlist = tips.groupby(['day'])['total_bill'].median().tolist()
xposlist = range(len(yposlist))
stringlist = ['n = 62','n = 19','n = 87','n = 76']

for i in range(len(stringlist)):
    ax.text(xposlist[i], yposlist[i], stringlist[i])

plt.show()

like image 35
Vinícius Figueiredo Avatar answered Oct 21 '22 09:10

Vinícius Figueiredo