I would like to plot boxplots without outliers with ggplot, giving focus on the boxes and whiskers only
For example:
p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot() + facet_wrap(~clarity, scales="free")
gives facetted boxplots with outliers
I can suppress outliers with outlier.size=NA:
p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot(outlier.size=NA) + facet_wrap(~clarity, scales="free")
which gives
Here, the y-axis scale is the same as in the original plot, just the outliers don't show up. How can I now modify the scale to "zoom in" on each panel according to the whisker ends?
I can reset ylim like this
ylim1 = boxplot.stats(diamonds$price)$stats[c(1, 5)]
and then replot
p1 + geom_boxplot(outlier.size=NA)
+ facet_wrap(~clarity, scales="free")
+ coord_cartesian(ylim = ylim1*1.05)
but this doesn't work on the facets:
Is there a way to "facet_wrap" the boxplots.stats function?
Edit:
I've tried to calculate the boxplot statistics dynamically, but this doesn't seem to work.
give.stats <- function(x){return(boxplot.stats(x)$stats[c(1,5)])}
p1 + geom_boxplot(outlier.size=NA) +
facet_wrap(~clarity, scales="free") +
coord_cartesian(ylim = give.stats)
> Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument
Any more ideas would be much appreciated.
We can remove outliers in R by setting the outlier. shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted.
You can use outlier. size = -1 but the best option to hide outliers is the one you suggested: outlier. shape = NA .
The default box plot in ggplot doesn't add the error bar lines, but you can add them with stat_boxplot , setting geom = "errorbar" . Note that you can change its width with width .
The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.
It can be done with stat_summary and custom statistic calculation function:
calc_boxplot_stat <- function(x) {
coef <- 1.5
n <- sum(!is.na(x))
# calculate quantiles
stats <- quantile(x, probs = c(0.0, 0.25, 0.5, 0.75, 1.0))
names(stats) <- c("ymin", "lower", "middle", "upper", "ymax")
iqr <- diff(stats[c(2, 4)])
# set whiskers
outliers <- x < (stats[2] - coef * iqr) | x > (stats[4] + coef * iqr)
if (any(outliers)) {
stats[c(1, 5)] <- range(c(stats[2:4], x[!outliers]), na.rm = TRUE)
}
return(stats)
}
ggplot(diamonds, aes(x=cut, y=price, fill=cut)) +
stat_summary(fun.data = calc_boxplot_stat, geom="boxplot") +
facet_wrap(~clarity, scales="free")
The stats calculation function is generic, thus no need for data manipulation before plotting.
It is also possible to set whiskers to 10% and 90% :
calc_stat <- function(x) {
coef <- 1.5
n <- sum(!is.na(x))
# calculate quantiles
stats <- quantile(x, probs = c(0.1, 0.25, 0.5, 0.75, 0.9))
names(stats) <- c("ymin", "lower", "middle", "upper", "ymax")
return(stats)
}
ggplot(diamonds, aes(x=cut, y=price, fill=cut)) +
stat_summary(fun.data = calc_stat, geom="boxplot") +
facet_wrap(~clarity, scales="free")
Through outlier.size=NA
you make the outliers disappear, this is not an option to ignore the outliers plotting the boxplots. So, the plots are generated considering the (invisible) outliers. There seems to be no option for what you want. In order to make the boxplots as you need them I would calculate the quantiles myself and generate the boxplots based on these quantiles, like in the following example:
stat<-tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x))
stats<-unlist(tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x)$stats))
df<-data.frame(
cut=rep(rep(unlist(dimnames(stat)[1]),each=5),length(unlist(dimnames(stat)[2]))),
clarity=rep(unlist(dimnames(stat)[2]),each=25),
price=unlist(tapply(diamonds$price,list(diamonds$cut,diamonds$clarity),function(x) boxplot.stats(x)$stats)))
ggplot(df,aes(x=cut,y=price,fill=cut))+geom_boxplot()+facet_wrap(~clarity,scales="free")
Which gives (note that the orders in the plot are different now):
Ok, I figured out a more easy way to do this by commenting out some lines in the original ggplot boxplot function and calling the modified function.
I am not a programmer, no idea if this is a good or robust thing to do, but it seems to work fine for now.
This is the modified function I am using:
#modified version of geom_boxplot
require(ggplot2)
geom_boxplot_noOutliers <- function (mapping = NULL, data = NULL, stat = "boxplot",
position = "dodge", outlier.colour = NULL,
outlier.shape = NULL, outlier.size = NULL,
notch = FALSE, notchwidth = .5, varwidth = FALSE,
...) {
#outlier_defaults <- ggplot2:::Geom$find('point')$default_aes()
#outlier.colour <- outlier.colour %||% outlier_defaults$colour
#outlier.shape <- outlier.shape %||% outlier_defaults$shape
#outlier.size <- outlier.size %||% outlier_defaults$size
GeomBoxplot_noOutliers$new(mapping = mapping, data = data, stat = stat,
position = position, outlier.colour = outlier.colour,
outlier.shape = outlier.shape, outlier.size = outlier.size, notch = notch,
notchwidth = notchwidth, varwidth = varwidth, ...)
}
GeomBoxplot_noOutliers <- proto(ggplot2:::Geom, {
objname <- "boxplot_noOutliers"
reparameterise <- function(., df, params) {
df$width <- df$width %||%
params$width %||% (resolution(df$x, FALSE) * 0.9)
# if (!is.null(df$outliers)) {
# suppressWarnings({
# out_min <- vapply(df$outliers, min, numeric(1))
# out_max <- vapply(df$outliers, max, numeric(1))
# })
#
# df$ymin_final <- pmin(out_min, df$ymin)
# df$ymax_final <- pmax(out_max, df$ymax)
# }
# if `varwidth` not requested or not available, don't use it
if (is.null(params) || is.null(params$varwidth) || !params$varwidth || is.null(df$relvarwidth)) {
df$xmin <- df$x - df$width / 2
df$xmax <- df$x + df$width / 2
} else {
# make `relvarwidth` relative to the size of the largest group
df$relvarwidth <- df$relvarwidth / max(df$relvarwidth)
df$xmin <- df$x - df$relvarwidth * df$width / 2
df$xmax <- df$x + df$relvarwidth * df$width / 2
}
df$width <- NULL
if (!is.null(df$relvarwidth)) df$relvarwidth <- NULL
df
}
draw <- function(., data, ..., fatten = 2, outlier.colour = NULL, outlier.shape = NULL, outlier.size = 2,
notch = FALSE, notchwidth = .5, varwidth = FALSE) {
common <- data.frame(
colour = data$colour,
size = data$size,
linetype = data$linetype,
fill = alpha(data$fill, data$alpha),
group = data$group,
stringsAsFactors = FALSE
)
whiskers <- data.frame(
x = data$x,
xend = data$x,
y = c(data$upper, data$lower),
yend = c(data$ymax, data$ymin),
alpha = NA,
common)
box <- data.frame(
xmin = data$xmin,
xmax = data$xmax,
ymin = data$lower,
y = data$middle,
ymax = data$upper,
ynotchlower = ifelse(notch, data$notchlower, NA),
ynotchupper = ifelse(notch, data$notchupper, NA),
notchwidth = notchwidth,
alpha = data$alpha,
common)
# if (!is.null(data$outliers) && length(data$outliers[[1]] >= 1)) {
# outliers <- data.frame(
# y = data$outliers[[1]],
# x = data$x[1],
# colour = outlier.colour %||% data$colour[1],
# shape = outlier.shape %||% data$shape[1],
# size = outlier.size %||% data$size[1],
# fill = NA,
# alpha = NA,
# stringsAsFactors = FALSE)
# outliers_grob <- GeomPoint$draw(outliers, ...)
# } else {
outliers_grob <- NULL
# }
ggname(.$my_name(), grobTree(
outliers_grob,
GeomSegment$draw(whiskers, ...),
GeomCrossbar$draw(box, fatten = fatten, ...)
))
}
guide_geom <- function(.) "boxplot_noOutliers"
draw_legend <- function(., data, ...) {
data <- aesdefaults(data, .$default_aes(), list(...))
gp <- with(data, gpar(col=colour, fill=alpha(fill, alpha), lwd=size * .pt, lty = linetype))
gTree(gp = gp, children = gList(
linesGrob(0.5, c(0.1, 0.25)),
linesGrob(0.5, c(0.75, 0.9)),
rectGrob(height=0.5, width=0.75),
linesGrob(c(0.125, 0.875), 0.5)
))
}
default_stat <- function(.) StatBoxplot
default_pos <- function(.) PositionDodge
default_aes <- function(.) aes(weight=1, colour="grey20", fill="white", size=0.5, alpha = NA, shape = 16, linetype = "solid")
required_aes <- c("x", "lower", "upper", "middle", "ymin", "ymax")
})
I saved it as an r file and use source
to load it:
library(ggplot2)
library(scales)
#load functions
source("D:/Eigene Dateien/Scripte/R-Scripte/myfunctions/geomBoxplot_noOutliers.r")
Now I can just plot without outliers using geom_boxplot_noOutliers
and everything works fine even with facets :-)
p1 <- ggplot(diamonds, aes(x=cut, y=price, fill=cut))
p1 + geom_boxplot_noOutliers() + facet_wrap(~clarity, scales="free")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With