Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No outliers in ggplot boxplot with facet_wrap

Tags:

r

ggplot2

I would like to plot boxplots without outliers with ggplot, giving focus on the boxes and whiskers only

For example:

p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot() + facet_wrap(~clarity, scales="free")

gives facetted boxplots with outliers

enter image description here

I can suppress outliers with outlier.size=NA:

p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot(outlier.size=NA) + facet_wrap(~clarity, scales="free")

which gives

enter image description here

Here, the y-axis scale is the same as in the original plot, just the outliers don't show up. How can I now modify the scale to "zoom in" on each panel according to the whisker ends?

I can reset ylim like this

ylim1 = boxplot.stats(diamonds$price)$stats[c(1, 5)]

and then replot

p1 + geom_boxplot(outlier.size=NA) 
   + facet_wrap(~clarity, scales="free") 
   +  coord_cartesian(ylim = ylim1*1.05)

but this doesn't work on the facets:

enter image description here

Is there a way to "facet_wrap" the boxplots.stats function?

Edit:

I've tried to calculate the boxplot statistics dynamically, but this doesn't seem to work.

give.stats <- function(x){return(boxplot.stats(x)$stats[c(1,5)])}

p1 + geom_boxplot(outlier.size=NA) + 
  facet_wrap(~clarity, scales="free") + 
  coord_cartesian(ylim = give.stats)

> Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument

Any more ideas would be much appreciated.

like image 821
user3460194 Avatar asked Aug 04 '14 18:08

user3460194


People also ask

How do you show no outliers in boxplot in R?

We can remove outliers in R by setting the outlier. shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted.

How do you remove whiskers from a boxplot?

You can use outlier. size = -1 but the best option to hide outliers is the one you suggested: outlier. shape = NA .

How do you add error bars in geom boxplot?

The default box plot in ggplot doesn't add the error bar lines, but you can add them with stat_boxplot , setting geom = "errorbar" . Note that you can change its width with width .

What do Ggplot Boxplots show?

The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.


3 Answers

It can be done with stat_summary and custom statistic calculation function:

calc_boxplot_stat <- function(x) {
  coef <- 1.5
  n <- sum(!is.na(x))
  # calculate quantiles
  stats <- quantile(x, probs = c(0.0, 0.25, 0.5, 0.75, 1.0))
  names(stats) <- c("ymin", "lower", "middle", "upper", "ymax")
  iqr <- diff(stats[c(2, 4)])
  # set whiskers
  outliers <- x < (stats[2] - coef * iqr) | x > (stats[4] + coef * iqr)
  if (any(outliers)) {
    stats[c(1, 5)] <- range(c(stats[2:4], x[!outliers]), na.rm = TRUE)
  }
  return(stats)
}

ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + 
    stat_summary(fun.data = calc_boxplot_stat, geom="boxplot") + 
    facet_wrap(~clarity, scales="free")

output figure

The stats calculation function is generic, thus no need for data manipulation before plotting.

It is also possible to set whiskers to 10% and 90% :

calc_stat <- function(x) {
  coef <- 1.5
  n <- sum(!is.na(x))
  # calculate quantiles
  stats <- quantile(x, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))
  names(stats) <- c("ymin", "lower", "middle", "upper", "ymax")
  return(stats)
}

ggplot(diamonds, aes(x=cut, y=price, fill=cut)) + 
    stat_summary(fun.data = calc_stat, geom="boxplot") + 
    facet_wrap(~clarity, scales="free")

Output figure with 10% and 90% whiskers

like image 69
Nikolay Avatar answered Oct 06 '22 23:10

Nikolay


Through outlier.size=NA you make the outliers disappear, this is not an option to ignore the outliers plotting the boxplots. So, the plots are generated considering the (invisible) outliers. There seems to be no option for what you want. In order to make the boxplots as you need them I would calculate the quantiles myself and generate the boxplots based on these quantiles, like in the following example:

stat<-tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x))
stats<-unlist(tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x)$stats))

df<-data.frame(
  cut=rep(rep(unlist(dimnames(stat)[1]),each=5),length(unlist(dimnames(stat)[2]))),
  clarity=rep(unlist(dimnames(stat)[2]),each=25),
  price=unlist(tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x)$stats)))

ggplot(df,aes(x=cut,y=price,fill=cut))+geom_boxplot()+facet_wrap(~clarity,scales="free")

Which gives (note that the orders in the plot are different now):

enter image description here

like image 28
DatamineR Avatar answered Oct 06 '22 22:10

DatamineR


Ok, I figured out a more easy way to do this by commenting out some lines in the original ggplot boxplot function and calling the modified function.

I am not a programmer, no idea if this is a good or robust thing to do, but it seems to work fine for now.

This is the modified function I am using:

#modified version of geom_boxplot

require(ggplot2)
geom_boxplot_noOutliers <- function (mapping = NULL, data = NULL, stat = "boxplot",
                          position = "dodge", outlier.colour = NULL,
                          outlier.shape = NULL, outlier.size = NULL,
                          notch = FALSE, notchwidth = .5, varwidth = FALSE,
                          ...) {

  #outlier_defaults <- ggplot2:::Geom$find('point')$default_aes()

  #outlier.colour   <- outlier.colour %||% outlier_defaults$colour
  #outlier.shape    <- outlier.shape  %||% outlier_defaults$shape
  #outlier.size     <- outlier.size   %||% outlier_defaults$size

  GeomBoxplot_noOutliers$new(mapping = mapping, data = data, stat = stat,
                  position = position, outlier.colour = outlier.colour,
                  outlier.shape = outlier.shape, outlier.size = outlier.size, notch = notch,
                  notchwidth = notchwidth, varwidth = varwidth, ...)
}

GeomBoxplot_noOutliers <- proto(ggplot2:::Geom, {
  objname <- "boxplot_noOutliers"

  reparameterise <- function(., df, params) {
    df$width <- df$width %||%
      params$width %||% (resolution(df$x, FALSE) * 0.9)

  # if (!is.null(df$outliers)) {
  #    suppressWarnings({
  #      out_min <- vapply(df$outliers, min, numeric(1))
  #      out_max <- vapply(df$outliers, max, numeric(1))
  #    })
  #    
  #    df$ymin_final <- pmin(out_min, df$ymin)
  #    df$ymax_final <- pmax(out_max, df$ymax)
  #   }

    # if `varwidth` not requested or not available, don't use it
    if (is.null(params) || is.null(params$varwidth) || !params$varwidth || is.null(df$relvarwidth)) {
      df$xmin <- df$x - df$width / 2
      df$xmax <- df$x + df$width / 2
    } else {
      # make `relvarwidth` relative to the size of the largest group
      df$relvarwidth <- df$relvarwidth / max(df$relvarwidth)
      df$xmin <- df$x - df$relvarwidth * df$width / 2
      df$xmax <- df$x + df$relvarwidth * df$width / 2
    }
    df$width <- NULL
    if (!is.null(df$relvarwidth)) df$relvarwidth <- NULL

    df
  }

  draw <- function(., data, ..., fatten = 2, outlier.colour = NULL, outlier.shape = NULL, outlier.size = 2,
                   notch = FALSE, notchwidth = .5, varwidth = FALSE) {
    common <- data.frame(
      colour = data$colour,
      size = data$size,
      linetype = data$linetype,
      fill = alpha(data$fill, data$alpha),
      group = data$group,
      stringsAsFactors = FALSE
    )

    whiskers <- data.frame(
      x = data$x,
      xend = data$x,
      y = c(data$upper, data$lower),
      yend = c(data$ymax, data$ymin),
      alpha = NA,
      common)

    box <- data.frame(
      xmin = data$xmin,
      xmax = data$xmax,
      ymin = data$lower,
      y = data$middle,
      ymax = data$upper,
      ynotchlower = ifelse(notch, data$notchlower, NA),
      ynotchupper = ifelse(notch, data$notchupper, NA),
      notchwidth = notchwidth,
      alpha = data$alpha,
      common)

  #  if (!is.null(data$outliers) && length(data$outliers[[1]] >= 1)) {
  #    outliers <- data.frame(
  #      y = data$outliers[[1]],
  #      x = data$x[1],
  #      colour = outlier.colour %||% data$colour[1],
  #      shape = outlier.shape %||% data$shape[1],
  #      size = outlier.size %||% data$size[1],
  #      fill = NA,
  #      alpha = NA,
  #      stringsAsFactors = FALSE)
  #    outliers_grob <- GeomPoint$draw(outliers, ...)
  #  } else {
      outliers_grob <- NULL
  #  }

    ggname(.$my_name(), grobTree(
      outliers_grob,
      GeomSegment$draw(whiskers, ...),
      GeomCrossbar$draw(box, fatten = fatten, ...)
    ))
  }

  guide_geom <- function(.) "boxplot_noOutliers"
  draw_legend <- function(., data, ...)  {
    data <- aesdefaults(data, .$default_aes(), list(...))
    gp <- with(data, gpar(col=colour, fill=alpha(fill, alpha), lwd=size * .pt, lty = linetype))
    gTree(gp = gp, children = gList(
      linesGrob(0.5, c(0.1, 0.25)),
      linesGrob(0.5, c(0.75, 0.9)),
      rectGrob(height=0.5, width=0.75),
      linesGrob(c(0.125, 0.875), 0.5)
    ))
  }

  default_stat <- function(.) StatBoxplot
  default_pos <- function(.) PositionDodge
  default_aes <- function(.) aes(weight=1, colour="grey20", fill="white", size=0.5, alpha = NA, shape = 16, linetype = "solid")
  required_aes <- c("x", "lower", "upper", "middle", "ymin", "ymax")

})

I saved it as an r file and use source to load it:

library(ggplot2)
library(scales)

#load functions
source("D:/Eigene Dateien/Scripte/R-Scripte/myfunctions/geomBoxplot_noOutliers.r")

Now I can just plot without outliers using geom_boxplot_noOutliers and everything works fine even with facets :-)

p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot_noOutliers() + facet_wrap(~clarity, scales="free")

enter image description here

like image 37
user3460194 Avatar answered Oct 06 '22 23:10

user3460194