Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Labeling Outliers of Boxplots in R

Tags:

I have the code that creates a boxplot, using ggplot in R, I want to label my outliers with the year and Battle.

Here is my code to create my boxplot

require(ggplot2) ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome",  y="Ratio of Portuguese to Dutch/British ships") +  geom_boxplot(outlier.size=2,outlier.colour="green") +  stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") +  ggtitle("Portugese Sea Battles") 

Can anyone help? I knew this is correct, I just want to label the outliers.

like image 837
Deborah_Watson Avatar asked Nov 04 '15 14:11

Deborah_Watson


People also ask

How do you label outliers on a boxplot in R?

We can identify and label these outliers by using the ggbetweenstats function in the ggstatsplot package. To label outliers, we're specifying the outlier. tagging argument as "TRUE" and we're specifying which variable to use to label each outlier with the outlier.

Does R boxplot show outliers?

boxplot() does not identify outliers, but it is quite easy to program, as boxplot. stats() supplies a list of outliers.. You can add a density plot (barcode plot) to the boxplot.

How do you highlight outliers in R?

To highlight outliers in a boxplot, we can create the boxplot with the help of Boxplot function of car package by defining the id.

How do you show outliers in a boxplot?

When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot. For example, outside 1.5 times the interquartile range above the upper quartile and below the lower quartile (Q1 - 1.5 * IQR or Q3 + 1.5 * IQR).


Video Answer


2 Answers

The following is a reproducible solution that uses dplyr and the built-in mtcars dataset.

Walking through the code: First, create a function, is_outlier that will return a boolean TRUE/FALSE if the value passed to it is an outlier. We then perform the "analysis/checking" and plot the data -- first we group_by our variable (cyl in this example, in your example, this would be PortugesOutcome) and we add a variable outlier in the call to mutate (if the drat variable is an outlier [note this corresponds to RatioPort2Dutch in your example], we will pass the drat value, otherwise we will return NA so that value is not plotted). Finally, we plot the results and plot the text values via geom_text and an aesthetic label equal to our new variable; in addition, we offset the text (slide it a bit to the right) with hjust so that we can see the values next to, rather than on top of, the outlier points.

library(dplyr) library(ggplot2)  is_outlier <- function(x) {   return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) }  mtcars %>%   group_by(cyl) %>%   mutate(outlier = ifelse(is_outlier(drat), drat, as.numeric(NA))) %>%   ggplot(., aes(x = factor(cyl), y = drat)) +     geom_boxplot() +     geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3) 

Boxplot

like image 193
JasonAizkalns Avatar answered Sep 19 '22 13:09

JasonAizkalns


You can do this simply within ggplot itself, using an appropriate stat_summary call.

ggplot(mtcars, aes(x = factor(cyl), y = drat, fill = factor(cyl))) +    geom_boxplot() +   stat_summary(     aes(label = round(stat(y), 1)),     geom = "text",      fun.y = function(y) { o <- boxplot.stats(y)$out; if(length(o) == 0) NA else o },     hjust = -1   ) 

enter image description here

like image 39
Axeman Avatar answered Sep 17 '22 13:09

Axeman