I have the code that creates a boxplot, using ggplot in R, I want to label my outliers with the year and Battle.
Here is my code to create my boxplot
require(ggplot2) ggplot(seabattle, aes(x=PortugesOutcome,y=RatioPort2Dutch ),xlim="OutCome", y="Ratio of Portuguese to Dutch/British ships") + geom_boxplot(outlier.size=2,outlier.colour="green") + stat_summary(fun.y="mean", geom = "point", shape=23, size =3, fill="pink") + ggtitle("Portugese Sea Battles")
Can anyone help? I knew this is correct, I just want to label the outliers.
We can identify and label these outliers by using the ggbetweenstats function in the ggstatsplot package. To label outliers, we're specifying the outlier. tagging argument as "TRUE" and we're specifying which variable to use to label each outlier with the outlier.
boxplot() does not identify outliers, but it is quite easy to program, as boxplot. stats() supplies a list of outliers.. You can add a density plot (barcode plot) to the boxplot.
To highlight outliers in a boxplot, we can create the boxplot with the help of Boxplot function of car package by defining the id.
When reviewing a box plot, an outlier is defined as a data point that is located outside the whiskers of the box plot. For example, outside 1.5 times the interquartile range above the upper quartile and below the lower quartile (Q1 - 1.5 * IQR or Q3 + 1.5 * IQR).
The following is a reproducible solution that uses dplyr
and the built-in mtcars
dataset.
Walking through the code: First, create a function, is_outlier
that will return a boolean TRUE/FALSE
if the value passed to it is an outlier. We then perform the "analysis/checking" and plot the data -- first we group_by
our variable (cyl
in this example, in your example, this would be PortugesOutcome
) and we add a variable outlier
in the call to mutate
(if the drat
variable is an outlier [note this corresponds to RatioPort2Dutch
in your example], we will pass the drat
value, otherwise we will return NA
so that value is not plotted). Finally, we plot the results and plot the text values via geom_text
and an aesthetic label equal to our new variable; in addition, we offset the text (slide it a bit to the right) with hjust
so that we can see the values next to, rather than on top of, the outlier points.
library(dplyr) library(ggplot2) is_outlier <- function(x) { return(x < quantile(x, 0.25) - 1.5 * IQR(x) | x > quantile(x, 0.75) + 1.5 * IQR(x)) } mtcars %>% group_by(cyl) %>% mutate(outlier = ifelse(is_outlier(drat), drat, as.numeric(NA))) %>% ggplot(., aes(x = factor(cyl), y = drat)) + geom_boxplot() + geom_text(aes(label = outlier), na.rm = TRUE, hjust = -0.3)
You can do this simply within ggplot
itself, using an appropriate stat_summary
call.
ggplot(mtcars, aes(x = factor(cyl), y = drat, fill = factor(cyl))) + geom_boxplot() + stat_summary( aes(label = round(stat(y), 1)), geom = "text", fun.y = function(y) { o <- boxplot.stats(y)$out; if(length(o) == 0) NA else o }, hjust = -1 )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With