I have some very, very few outliers in my dataset making the boxplots difficult to read:
library(ggplot2)
mtcars$mpg[1] <- 60
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()
Hence, I would like to indicate the extreme outliers like this:
Any ideas how to do this in ggplot2
? Transforming the axis is not an option for me...
We can remove outliers in R by setting the outlier. shape argument to NA. In addition, the coord_cartesian() function will be used to reject all outliers that exceed or below a given quartile. The y-axis of ggplot2 is not automatically adjusted.
We can identify and label these outliers by using the ggbetweenstats function in the ggstatsplot package. To label outliers, we're specifying the outlier. tagging argument as "TRUE" and we're specifying which variable to use to label each outlier with the outlier.
The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.
This is a start:
library("ggplot2")
mtcars$mpg[1:2] <- c(50,60)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot()
Define max value:
maxval <- 40
Use dplyr
(could also be done in base R or plyr
) to extract outliers and put together the text string:
library("dplyr")
dd <- mtcars %>% filter(mpg>maxval) %>%
group_by(cyl) %>%
summarise(outlier_txt=paste(mpg,collapse=","))
Set max y value and add an arrow plus label:
library("grid") # needed for arrow() function
p2 <- p + geom_boxplot() +
scale_y_continuous(limits=c(min(mtcars$mpg),maxval))+
geom_text(data=dd,aes(y=maxval,label=outlier_txt),
size=3,vjust=1.5,hjust=-0.5)+
geom_segment(data=dd,aes(y=maxval*0.95,yend=maxval,
xend=factor(cyl)),
arrow = arrow(length = unit(0.1,"cm")))
p2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With