Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Violin plot: How is the adjacent value range determined, and why is it different from boxplot?

In theory the violinplot of vioplot package is a boxplot + density function.

In the "boxplot part",

  • the black box corresponds to the IQR (indeed, see below), and

  • the midline should correspond to the same range (adjacent values, default 1.5 IQR), yet it is not (see below). Anyone can explain why are they different?

    require("vioplot")
    a = rnorm(100)
    range (a)
    a = c(a,2,8,2.9,3,4, -3, -5) # add some outliers
    
    par ( mfrow = c(1,2))
    boxplot(a, range=1.5)
    vioplot(a, range=1.5 )
    

Benerated by above:

Box vs Vio generated by above lines

Hintze, J. L. and R. D. Nelson (1998). Violin plots: a box plot-density trace synergism. The American Statistician, 52(2):181-4.

like image 722
bud.dugong Avatar asked Oct 02 '15 11:10

bud.dugong


People also ask

What is the difference between violin plot and boxplot?

A violin plot is more informative than a plain box plot. While a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data. The difference is particularly useful when the data distribution is multimodal (more than one peak).

What does a violin plot show that a boxplot does not?

With the added density information, violin plot nicely reveal the structure in the data, while a boxplot does not. And this is why violin plot is better than boxplot, when you have enough data to estimate the density.

What is the adjacent value of box plot?

The whiskers of a boxplot extend to values known as adjacent values. These are the values in the data that are furthest away from the median on either side of the box, but are still within a distance of 1.5 times the interquartile range from the nearest end of the box (that is, the nearer quartile).

What does violin plot tell us?

A violin plot is a hybrid of a box plot and a kernel density plot, which shows peaks in the data. It is used to visualize the distribution of numerical data. Unlike a box plot that can only show summary statistics, violin plots depict summary statistics and the density of each variable.


Video Answer


1 Answers

Let me illustrate this with a simple example:

b <- c(1:10, 20)

par(mfrow = c(1,2))
boxplot(b, range=1.5)
vioplot(b, range=1.5 )

enter image description here

The definition of R's boxplot is (borrowing from ggplot's help on the topic):

The upper whisker extends from the hinge to the highest value that is within 1.5 * IQR of the hinge, where IQR is the inter-quartile range, or distance between the first and third quartiles.

Browsing the source code of vioplot, we see upper[i] <- min(q3[i] + range*iqd, data.max).

Therefore, let us try to reproduce the upper whisker value:

# vioplot draws
quantile(b, 0.75) + 1.5 * IQR(b)
# 16

# boxplot draws
max(b[b <= quantile(b, 0.75) + 1.5 * IQR(b)])
# 10
like image 54
tonytonov Avatar answered Oct 08 '22 11:10

tonytonov