I'm trying to plot boxplot with ggplot2. I want to change the middle to mean.
I know people have asked similar questions before, but I'm asking this because the solution didn't work for me. Specifically I followed the first solution in this accepted answer
This is what I did with mpg test data:
library(ggplot2)
library(tidyverse)
mpg %>%
ggplot(aes(x = class, y = cty, middle = mean(cty))) +
geom_boxplot()
It has no effect.
graph plotting mean:

graph plotting with default median:

Can anyone help to point out what I did wrong? Thanks.
Messing around with another dataset, mtcars, shows the same thing, defining middle doesn't change it. And that one has some larger differences in mean to median. Another option is using stat_summary, although I can't get the points function to work just right, and had to tweak it to not get a arguments imply differing number of rows: 1, 0 error.
BoxMeanQuant <- function(x) {
v <- c(min(x), quantile(x, 0.25), mean(x), quantile(x, 0.75), max(x))
names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
v
}
mpg %>%
ggplot(aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot")

Compared to the normal geom_boxplot, which is not using the defined middle.
mpg %>%
ggplot(aes(x = class, y = cty)) +
geom_boxplot(aes(middle = mean(cty)))

This is what I was using to plot the outliers as points, but they're different from whatever the default for geom_boxplot is. You can adjust as necessary. Without using the if-else it would throw an error.
BoxMeanQuant <- function(x) {
v <- c(quantile(x, 0.1), quantile(x, 0.25), mean(x), quantile(x, 0.75), quantile(x, 0.9))
names(v) <- c("ymin", "lower", "middle", "upper", "ymax")
v
}
outliers <- function(x) {
if (length(x) > 5) {
subset(x, x < quantile(x, 0.1) | quantile(x, 0.9) < x)
} else {
return(NA)
}
}
ggplot(data = mpg, aes(x = class, y = cty)) +
stat_summary(fun.data = BoxMeanQuant, geom = "boxplot") +
stat_summary(fun.y = outliers, geom = "point")

In the end I had to create a summary df to do this. It is not what I was originally looking for, but it works.
df <- mpg %>%
group_by(class) %>%
summarize(ymin = min(cty), ymax = max(cty), lower = quantile(cty, 0.25), upper = quantile(cty, 0.75), middle = mean(cty))
df %>%
ggplot(aes(class)) +
geom_boxplot(aes(ymin = ymin, ymax = ymax, lower = lower, upper = upper, middle = middle), stat = 'identity')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With