Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Include indication of extreme outliers in ggplot

I have some very, very few outliers in my dataset making the boxplots difficult to read:

library(ggplot2)
mtcars$mpg[1] <- 60
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()

enter image description here

Hence, I would like to indicate the extreme outliers like this:

enter image description here

Any ideas how to do this in ggplot2? Transforming the axis is not an option for me...

like image 932
gosz Avatar asked Apr 05 '15 22:04

gosz


People also ask

How does Ggplot deal with outliers?

We can remove outliers in R by setting the outlier. shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted.

How do you label outliers on a boxplot in R?

We can identify and label these outliers by using the ggbetweenstats function in the ggstatsplot package. To label outliers, we're specifying the outlier. tagging argument as "TRUE" and we're specifying which variable to use to label each outlier with the outlier.

What do Ggplot Boxplots show?

The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.


1 Answers

This is a start:

library("ggplot2")
mtcars$mpg[1:2] <- c(50,60)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()

Define max value:

maxval <- 40

Use dplyr (could also be done in base R or plyr) to extract outliers and put together the text string:

library("dplyr")
dd <- mtcars %>% filter(mpg>maxval) %>%
    group_by(cyl) %>%
        summarise(outlier_txt=paste(mpg,collapse=","))

Set max y value and add an arrow plus label:

library("grid") # needed for arrow() function
p2 <- p + geom_boxplot() +
    scale_y_continuous(limits=c(min(mtcars$mpg),maxval))+
       geom_text(data=dd,aes(y=maxval,label=outlier_txt),
                 size=3,vjust=1.5,hjust=-0.5)+
          geom_segment(data=dd,aes(y=maxval*0.95,yend=maxval,
                       xend=factor(cyl)),
                 arrow = arrow(length = unit(0.1,"cm")))
p2

enter image description here

like image 117
Ben Bolker Avatar answered Sep 26 '22 03:09

Ben Bolker