I am trying to create a boxplot for 2 groups across several factors along with labels for the number of observations. When there are no observations for one group a one factor level, the box for the group with observations takes up the space of both and looks odd.
Minimal example:
library(tidyverse)
mtcars %>%
select(mpg, cyl,am) %>%
filter(!(cyl == 8 & am == 0)) %>%
ggplot(aes(factor(cyl),mpg,fill=factor(am))) +
stat_boxplot(geom = "errorbar") + ## Draw horizontal lines across ends of whiskers
geom_boxplot(outlier.shape=1, outlier.size=3,
position = position_dodge(width = 0.75)) +
geom_text(data = mtcars %>%
select(mpg, cyl, am) %>%
filter(!(cyl == 8 & am == 0)) %>%
group_by(cyl, am) %>%
summarize(Count = n(),
q3 = quantile(mpg, 0.75),
iqr = IQR(mpg),
lab_pos = max(ifelse(mpg < q3+1.5*iqr, mpg, NA), na.rm = TRUE)),
aes(x= factor(cyl), y = lab_pos,label = paste0("n = ",Count, "\n")),
position = position_dodge(width = 0.75))
Which produces:
Is there a way to make the box for am(1)
at cyl(8)
half the width, so it's consistent with the other boxes on the plot? I have tried to use fake data, but that results in a count label for am(0)
at cyl(8)
.
The widths of the box plot indicate the size of the samples. The wider the box, the larger the sample. This is usually an option in statistical software programs, not all Box Plots have the widths proportional to the sample size.
To change the axis scales on a plot in base R, we can use the xlim() and ylim() functions.
The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less. The middle “box” represents the middle 50% of scores for the group.
To reorder the boxplot we will use reorder() function of ggplot2. By default, ggplot2 orders the groups in alphabetical order.
I was able to get a reasonable solution to this by installing the latest version of ggplot2
from GitHub and using position_dodge2
which uses preserve = "single"
by default.
# Install devtools
install.packages('devtools')
# Install dependency of scales package
install.packages(c("RColorBrewer", "stringr", "dichromat",
"munsell", "plyr", "colorspace"))
# Load devtools
library(devtools)
# Move to development mode
# This installed scales and ggplot2 in the "~/R-dev" directory,
# so CRAN version of ggplot2 is not removed.
dev_mode(TRUE)
# Install scales
install_github("hadley/scales")
# Main branch of development
install_github("hadley/ggplot2", "hadley/develop")
# load development version of ggplot2
library(dplyr)
library(ggplot2)
mtcars %>%
select(mpg, cyl,am) %>%
filter(!(cyl == 8 & am == 0)) %>%
ggplot(aes(factor(cyl),mpg,fill=factor(am))) +
stat_boxplot(geom = "errorbar",
position = position_dodge2(width = 0.75, preserve = "single")) +
geom_boxplot(outlier.shape=1, outlier.size=3,
position = position_dodge2(width = 0.75, preserve = "single")) +
geom_text(data = mtcars %>%
select(mpg, cyl, am) %>%
filter(!(cyl == 8 & am == 0)) %>%
group_by(cyl, am) %>%
summarize(Count = n(),
q3 = quantile(mpg, 0.75),
iqr = IQR(mpg),
lab_pos = max(mpg)),
aes(x= factor(cyl), y = lab_pos,label = paste0("n = ",Count, "\n")),
position = position_dodge2(width = 0.75, preserve = "single"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With