I want to draw boxplots in R and add names to outliers. So far I found this solution.
The function there provides all the functionality I need, but it scrambles incorrectly the labels. In the following example, it marks the outlier as "u" instead of "o":
library(plyr)
library(TeachingDemos)
source("http://www.r-statistics.com/wp-content/uploads/2011/01/boxplot-with-outlier-label-r.txt") # Load the function
set.seed(1500)
y <- rnorm(20)
x1 <- sample(letters[1:2], 20,T)
lab_y <- sample(letters, 20)
# plot a boxplot with interactions:
boxplot.with.outlier.label(y~x1, lab_y)
Do you know of any solution? The ggplot2 library is super nice, but provides no such functionality (as far as I know). My alternative is to use the text() function and extract the outlier information from the boxplot object. However, like this the labels may overlap.
Thanks a lot :-)
I took a look at this with debug(boxplot.with.outlier.label)
, and ... it turns out there's a bug
in the function.
The error occurs on line 125, where the data.frame DATA
is constructed from x
,y
and label_name
.
Previously x
and y
have been reordered, while lab_y
hasn't been. When the supplied value of x
(your x1
) isn't itself already in order, you'll get the kind of jumbling you experienced.
As an immediate fix, you can pre-order the x
values like this (or do something more elegant)
df <- data.frame(y, x1, lab_y, stringsAsFactors=FALSE)
df <- df[order(df$x1), ]
# Needed since lab_y is not searched for in data (though it probably should be)
lab_y <- df$lab_y
boxplot.with.outlier.label(y~x1, lab_y, data=df)
The intelligent point label placement is a separate issue discussed here or here. There's no ultimate and ideal solution so you just have to pick one there.
So you would overplot the normal boxplot with labels, as follows:
set.seed(1501)
y <- c(4, 0, 7, -5, rnorm(16))
x1 <- c("a", "a", "b", "b", sample(letters[1:2], 16, T))
lab_y <- sample(letters, 20)
bx <- boxplot(y~x1)
out_lab <- c()
for (i in seq(bx$out)) {
out_lab[i] <- lab_y[which(y == bx$out[i])[1]]
}
identify(bx$group, bx$out, labels = out_lab, cex = 0.7)
Then, during the identify()
is running, you just click to position where you want the label,
as described here. When finished, you just press "STOP".
Note that each outlier can have more than one label! In my solution, I just simply picked the first!!
PS: I feel ashamed for the for loop, but don't know how to vectorize it - feel free to post improvement.
EDIT: inspired by the Federico's link now I see it can be done much easier! Just these 2 commands:
boxplot(y~x1)
identify(as.integer(as.factor(x1)), y, labels = lab_y, cex = 0.7)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With