Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I maintain consistent box width in a boxplot where factor*group combination has no observations?

Tags:

r

dplyr

ggplot2

I am trying to create a boxplot for 2 groups across several factors along with labels for the number of observations. When there are no observations for one group a one factor level, the box for the group with observations takes up the space of both and looks odd.

Minimal example:

library(tidyverse)

mtcars %>%
  select(mpg, cyl,am) %>%
  filter(!(cyl == 8 & am == 0)) %>%
  ggplot(aes(factor(cyl),mpg,fill=factor(am))) + 
  stat_boxplot(geom = "errorbar") + ## Draw horizontal lines across ends of whiskers
  geom_boxplot(outlier.shape=1, outlier.size=3, 
               position =  position_dodge(width = 0.75)) +
  geom_text(data = mtcars %>%
              select(mpg, cyl, am) %>%
              filter(!(cyl == 8 & am == 0)) %>%
              group_by(cyl, am) %>%
              summarize(Count = n(),
                      q3 = quantile(mpg, 0.75),
                      iqr = IQR(mpg),
                      lab_pos = max(ifelse(mpg < q3+1.5*iqr, mpg, NA), na.rm = TRUE)),
                      aes(x= factor(cyl), y = lab_pos,label = paste0("n = ",Count, "\n")),
                  position = position_dodge(width = 0.75))

Which produces:

enter image description here

Is there a way to make the box for am(1) at cyl(8) half the width, so it's consistent with the other boxes on the plot? I have tried to use fake data, but that results in a count label for am(0) at cyl(8).

like image 231
JLC Avatar asked Dec 11 '17 04:12

JLC


People also ask

What does the width of a box plot indicate?

The widths of the box plot indicate the size of the samples. The wider the box, the larger the sample. This is usually an option in statistical software programs, not all Box Plots have the widths proportional to the sample size.

How do I change the scale of a Boxplot in R?

To change the axis scales on a plot in base R, we can use the xlim() and ylim() functions.

How do you explain Boxplot results?

The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less. The middle “box” represents the middle 50% of scores for the group.

How do you reorder a Boxplot?

To reorder the boxplot we will use reorder() function of ggplot2. By default, ggplot2 orders the groups in alphabetical order.


1 Answers

I was able to get a reasonable solution to this by installing the latest version of ggplot2 from GitHub and using position_dodge2 which uses preserve = "single" by default.

# Install devtools
install.packages('devtools')

# Install dependency of scales package
install.packages(c("RColorBrewer", "stringr", "dichromat", 
                   "munsell", "plyr", "colorspace"))

# Load devtools
library(devtools)

# Move to development mode
# This installed scales and ggplot2 in the "~/R-dev" directory, 
# so CRAN version of ggplot2 is not removed.
dev_mode(TRUE)

# Install scales
install_github("hadley/scales")

# Main branch of development
install_github("hadley/ggplot2", "hadley/develop")

# load development version of ggplot2
library(dplyr)
library(ggplot2)

mtcars %>%
  select(mpg, cyl,am) %>%
  filter(!(cyl == 8 & am == 0)) %>%
  ggplot(aes(factor(cyl),mpg,fill=factor(am))) + 
  stat_boxplot(geom = "errorbar",
               position =  position_dodge2(width = 0.75, preserve = "single")) + 
  geom_boxplot(outlier.shape=1, outlier.size=3, 
               position =  position_dodge2(width = 0.75, preserve = "single")) +
  geom_text(data = mtcars %>%
              select(mpg, cyl, am) %>%
              filter(!(cyl == 8 & am == 0)) %>%
              group_by(cyl, am) %>%
              summarize(Count = n(),
                        q3 = quantile(mpg, 0.75),
                        iqr = IQR(mpg),
                        lab_pos = max(mpg)),
            aes(x= factor(cyl), y = lab_pos,label = paste0("n = ",Count, "\n")),
            position = position_dodge2(width = 0.75, preserve = "single"))

enter image description here

like image 132
JLC Avatar answered Sep 23 '22 05:09

JLC