Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 box-whisker plot: show 95% confidence intervals & remove outliers

Tags:

plot

r

ggplot2

I'd like a box plot that looks just like the one below. But instead of the default, I'd like to present (1) 95% confidence intervals and (2) without the outliers.

The 95% confidence intervals could mean (i) extending the boxes and removing the whiskers, or (ii) having just a mean and whiskers, and removing the boxes. Or if people have other ideas for presenting 95% confidence intervals in a plot like this, I'm open to suggestions. The final goals is to show mean and conf intervals for data across multiple categories on the same plot.

set.seed(1234)
df <- data.frame(cond = factor( rep(c("A","B"), each=200) ), 
                   rating = c(rnorm(200),rnorm(200, mean=.8))
ggplot(df, aes(x=cond, y=rating, fill=cond)) + geom_boxplot() + 
    guides(fill=FALSE) + coord_flip()

enter image description here

Image and code source: http://www.cookbook-r.com/Graphs/Plotting_distributions_(ggplot2)/

like image 660
Dr. Beeblebrox Avatar asked Jan 23 '14 14:01

Dr. Beeblebrox


People also ask

Does boxplot show 95 confidence interval?

The notched boxplot allows you to evaluate confidence intervals (by default 95 percent confidence interval) for the medians of each boxplot.

Where is the 95% confidence limit on a box plot?

Notch in box plots is 95% confidence interval for median; whiskers exclude outliers. Horizontal black lines are global medians; green line in ASR/N plot highlights ASR = N.

What is the difference between box plots and confidence intervals?

I see no "versus" here. Box plots show the entire distribution, summarized. You say you find them helpful. Confidence intervals arise when your concern is to estimate some parameter, say the mean of a variable, but quite possibly something else.


2 Answers

I've used the following to show a 95% interval. Based on what I've read it's not an uncommon use of box and whisker, but it's not the default, so you do need to make it clear what you're showing in the graph.

quantiles_95 <- function(x) {
  r <- quantile(x, probs=c(0.05, 0.25, 0.5, 0.75, 0.95))
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
  r
}

ggplot(df, aes(x=cond, y=rating, fill=cond)) +
    guides(fill=F) +
    coord_flip() +
    stat_summary(fun.data = quantiles_95, geom="boxplot")

enter image description here

Instead of use geom_boxplot, use stat_summary with a custom function that specifies the limits you want to use:

  • "ymin" is the lower limit of the lower whisker
  • "lower" is the lower limit of the lower box
  • "middle" is the middle of the box (typically the median)
  • "upper" is the upper limit of the upper box
  • "ymax" is the upper limit of the upper whisker.

In the provided function (quantiles_95), the builtin quantile function is used with custom probs argument. As given, the whiskers will span 90% of your data: from the bottom 5% to the upper 95%. The boxes will span the middle two quartiles, as usual, from 25% to 75%.

You can always change the custom function to choose different quantiles (or even to not use quantiles), but you need to be very careful with this. As pointed out in a comment, there is a certain expectation when one sees a box and whisker plot. If you're using the same shape plot to convey different information, you're likely to confuse people.

If you want to get rid of the whiskers, make the "ymin" equal to "lower" and the "ymax" equal to "upper". If you want to have all whiskers and no box, set "upper" and "lower" both equal to "middle" (or just use geom_errorbars).

like image 61
brianmearns Avatar answered Oct 22 '22 02:10

brianmearns


You can hide the outliers by setting the size to 0:

ggplot(df, aes(x=cond, y=rating, fill=cond)) + 
  geom_boxplot(outlier.size = 0) + 
  guides(fill=FALSE) + coord_flip()

You can add the mean to the plot with the stat_summary function:

ggplot(df, aes(x=cond, y=rating, fill=cond)) + 
  geom_boxplot(outlier.size = 0) + 
  stat_summary(fun.y="mean", geom="point", shape=23, size=4, fill="white") +
  guides(fill=FALSE) + 
  coord_flip()
like image 27
Jaap Avatar answered Oct 22 '22 03:10

Jaap