Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plot only one side/half of the violin plot

I would like to have only one half of violin plots (similar to the plots created by stat_density_ridges from ggridges). A MWE

library(ggplot2)

dframe = data.frame(val = c(), group = c())
for(i in 1:5){
  offset = i - 3
  dframe = rbind(dframe, 
                 data.frame(val = rnorm(n = 50, mean = 0 - offset), group = i)
                 )
}
dframe$group = as.factor(dframe$group)

ggplot(data = dframe, aes(x = group, y = val)) + 
  geom_violin()      

produces a plot like this

enter image description here

I though would like to have one looking like this:

enter image description here

Ideally, the plots would also be scaled to like 1.5 to 2 times the width.

like image 364
Exocom Avatar asked Aug 27 '18 07:08

Exocom


People also ask

What is a truncated violin plot?

Violin plots come in two main varieties: "truncated" or "extended". With a "truncated" violin plot, the curve of the violin extends only to the minimum and maximum values in the data set. At those values, the curve is trimmed, forming a horizontal line connecting both sides of the violin.

What is the correct way of plotting violin plot?

the kernel density plot used for creating the violin plot is the same as the one added on top of the histogram. Wider sections of the violin plot represent a higher probability of observations taking a given value, the thinner sections correspond to a lower probability.

What are the plotted elements of a violin plot?

Typically, a violin plot will include all the data that is in a box plot: a marker for the data's median; a box or marker representing the interquartile range; and, assuming the number of samples is not too large, all sample points.

What do the violin plots reveal that box plots do not?

A violin plot is a hybrid of a box plot and a kernel density plot, which shows peaks in the data. It is used to visualize the distribution of numerical data. Unlike a box plot that can only show summary statistics, violin plots depict summary statistics and the density of each variable.


2 Answers

There's a neat solution by @David Robinson (original code is from his gists and I did only a couple of modifications).

He creates new layer (GeomFlatViolin) which is based on changing width of the violin plot:

data <- transform(data, 
                  xmaxv = x,
                  xminv = x + violinwidth * (xmin - x))

This layer also has width argument.


Example:

# Using OPs data
# Get wanted width with: geom_flat_violin(width = 1.5)
ggplot(dframe, aes(group, val)) +
    geom_flat_violin()

enter image description here

Code:

library(ggplot2)
library(dplyr)


"%||%" <- function(a, b) {
  if (!is.null(a)) a else b
}

geom_flat_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
                        position = "dodge", trim = TRUE, scale = "area",
                        show.legend = NA, inherit.aes = TRUE, ...) {
  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomFlatViolin,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      trim = trim,
      scale = scale,
      ...
    )
  )
}

GeomFlatViolin <-
  ggproto("GeomFlatViolin", Geom,
          setup_data = function(data, params) {
            data$width <- data$width %||%
              params$width %||% (resolution(data$x, FALSE) * 0.9)

            # ymin, ymax, xmin, and xmax define the bounding rectangle for each group
            data %>%
              group_by(group) %>%
              mutate(ymin = min(y),
                     ymax = max(y),
                     xmin = x - width / 2,
                     xmax = x)
          },

          draw_group = function(data, panel_scales, coord) {
            # Find the points for the line to go all the way around
            data <- transform(data, 
                              xmaxv = x,
                              xminv = x + violinwidth * (xmin - x))

            # Make sure it's sorted properly to draw the outline
            newdata <- rbind(plyr::arrange(transform(data, x = xminv), y),
                             plyr::arrange(transform(data, x = xmaxv), -y))

            # Close the polygon: set first and last point the same
            # Needed for coord_polar and such
            newdata <- rbind(newdata, newdata[1,])

            ggplot2:::ggname("geom_flat_violin", GeomPolygon$draw_panel(newdata, panel_scales, coord))
          },

          draw_key = draw_key_polygon,

          default_aes = aes(weight = 1, colour = "grey20", fill = "white", size = 0.5,
                            alpha = NA, linetype = "solid"),

          required_aes = c("x", "y")
)
like image 55
pogibas Avatar answered Oct 16 '22 19:10

pogibas


Package see has also a function geom_violinhalf that seems to do exactly what you want (see right plot below). It behaves mostly like geom_violin(), except does not have all arguments geom_violin() has (missing for example draw_quantiles)

library(ggplot2)
library(see)


p <- ggplot(mtcars, aes(factor(cyl), mpg))
p1 <- p + geom_violin()+ ggtitle("geom_violin")
p2 <- p + see::geom_violinhalf()+ ggtitle("see::geom_violinhalf")

## show them next to each other
library(patchwork)
p1+p2

Created on 2020-04-30 by the reprex package (v0.3.0)

like image 25
Matifou Avatar answered Oct 16 '22 19:10

Matifou