Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Align violin plots with dodged box plots

I have this data frame

set.seed(1234)
x <- rnorm(80, 5, 1)
df <- data.frame(groups = c(rep("group1",20),
                           rep("group2",20),
                           rep("group3",20),
                           rep("group4",20)),
                    value = x,
                    type = c(rep("A", 10),
                           rep("B", 10),
                           rep("A", 10),
                           rep("B", 10),
                           rep("A", 10),
                           rep("B", 10),
                           rep("A", 10),
                           rep("B", 10)))

And I would like to plot it as violin plot, aligned with a narrow box plot and grouped by 'type':

ggplot(data = df, aes(x = groups, y = value, fill = type)) +
  geom_violin()+
  geom_boxplot(width = 0.1, outlier.colour = NA)

However, the box plots does not align to the violin plots. What is the missing argument to tell ggplot to do such overlying?

Thanks!

like image 656
Sergio.pv Avatar asked Nov 19 '14 08:11

Sergio.pv


People also ask

What do violin plots show that Boxplots don t?

A violin plot is a hybrid of a box plot and a kernel density plot, which shows peaks in the data. It is used to visualize the distribution of numerical data. Unlike a box plot that can only show summary statistics, violin plots depict summary statistics and the density of each variable.

Why might you want to use violin plots instead of Boxplots?

A violin plot is more informative than a plain box plot. While a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data. The difference is particularly useful when the data distribution is multimodal (more than one peak).

Why are violin plots so wavy?

Since violin plots show kernel density estimates, the smoothness is determined by the smoothing bandwidth (or smoothing parameter). The "wavy" effect is most likely due to a smoothing bandwidth that is too small for the data.

What does the box in the center of the violin plot represent?

The boxplot summarizes the center and spread: The white dot in the center of the box represents the median. The length of the box represents the interquartile range (IQR). The length of the line that extends out of the box represents the range.


1 Answers

You need to set the width of the dodging explicitly for both geoms:

dodge <- position_dodge(width = 0.4)

ggplot(data = df, aes(x = groups, y = value, fill = type)) +
  geom_violin(position = dodge)+
  geom_boxplot(width=.1, outlier.colour=NA, position = dodge) 

enter image description here


For a more thorough explanation, see What is the width argument in position_dodge?

like image 93
Henrik Avatar answered Oct 03 '22 00:10

Henrik