In theory the violinplot of vioplot package is a boxplot + density function.
In the "boxplot part",
the black box corresponds to the IQR (indeed, see below), and
the midline should correspond to the same range (adjacent values, default 1.5 IQR), yet it is not (see below). Anyone can explain why are they different?
require("vioplot")
a = rnorm(100)
range (a)
a = c(a,2,8,2.9,3,4, -3, -5) # add some outliers
par ( mfrow = c(1,2))
boxplot(a, range=1.5)
vioplot(a, range=1.5 )
Benerated by above:
Hintze, J. L. and R. D. Nelson (1998). Violin plots: a box plot-density trace synergism. The American Statistician, 52(2):181-4.
A violin plot is more informative than a plain box plot. While a box plot only shows summary statistics such as mean/median and interquartile ranges, the violin plot shows the full distribution of the data. The difference is particularly useful when the data distribution is multimodal (more than one peak).
With the added density information, violin plot nicely reveal the structure in the data, while a boxplot does not. And this is why violin plot is better than boxplot, when you have enough data to estimate the density.
The whiskers of a boxplot extend to values known as adjacent values. These are the values in the data that are furthest away from the median on either side of the box, but are still within a distance of 1.5 times the interquartile range from the nearest end of the box (that is, the nearer quartile).
A violin plot is a hybrid of a box plot and a kernel density plot, which shows peaks in the data. It is used to visualize the distribution of numerical data. Unlike a box plot that can only show summary statistics, violin plots depict summary statistics and the density of each variable.
Let me illustrate this with a simple example:
b <- c(1:10, 20)
par(mfrow = c(1,2))
boxplot(b, range=1.5)
vioplot(b, range=1.5 )
The definition of R's boxplot is (borrowing from ggplot's help on the topic):
The upper whisker extends from the hinge to the highest value that is within 1.5 * IQR of the hinge, where IQR is the inter-quartile range, or distance between the first and third quartiles.
Browsing the source code of vioplot, we see upper[i] <- min(q3[i] + range*iqd, data.max)
.
Therefore, let us try to reproduce the upper whisker value:
# vioplot draws
quantile(b, 0.75) + 1.5 * IQR(b)
# 16
# boxplot draws
max(b[b <= quantile(b, 0.75) + 1.5 * IQR(b)])
# 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With