Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a base R style boxplot using ggplot2?

Tags:

r

ggplot2

boxplot

I need to make a lot of boxplots for an upcoming publication. I would like to use ggplot2 because I think it will be more flexible for future projects, but my PI is insisting that I make these plots in the style of base-R. He specifically wants the dashed lines, so that they will appear similar to previous plots we made. I have made an example using the iris dataset to show you, using this code:

plot(iris$Species,
     iris$Sepal.Length,
     xlab='Species',
     ylab='Sepal Length',
     main='Sepal Variation Across Species',
     col='white')

base R plot

My question is how to make a similar looking plot using ggplot2?

Here is my attempt:

library("ggplot2")
ggplot(iris) +
  geom_boxplot(aes(x=Species,y=Sepal.Length),linetype="dashed") +
  ggtitle("Sepal Variation Across Species")

ggplot attempt

I need the combination of dashed and solid lines, but I cannot make anything work. I have already checked https://stats.stackexchange.com/questions/8137/how-to-add-horizontal-lines-to-ggplot2-boxplot which is very very close but no dashed lines, which we need. Also the outliers are filled circles, which is not the same as base-R.

like image 370
Ravi Mann Avatar asked Nov 06 '18 10:11

Ravi Mann


People also ask

How can you create a boxplot using ggplot2?

In ggplot2, geom_boxplot() is used to create a boxplot. Let us first create a regular boxplot, for that we first have to import all the required libraries and dataset in use. Then simply put all the attributes to plot by in ggplot() function along with geom_boxplot.

How do you add a boxplot to a base in R?

R. Output: In order to show mean values in boxplot using ggplot2, we use the stat_summary() function to compute new summary statistics and add them to the plot. We use stat_summary() function with ggplot() function.

How do you arrange a boxplot order in R?

To reorder the boxplot we will use reorder() function of ggplot2. By default, ggplot2 orders the groups in alphabetical order. But for better visualization of data sometimes we need to reorder them in increasing and decreasing order. This is where the reorder() function comes into play.

What command makes a boxplot in R?

Boxplots are created in R by using the boxplot() function.


3 Answers

To generate a "base R style" boxplot using ggplot2, we can layer 4 boxplot objects over top of one another. The order does matter here, so please keep this in mind if you modify the code. I strongly suggest that you explore this code by plotting each boxplot layer on its own; that way you can get a feel for how the different layers interact.

The ordering of the boxplots works like this (ordered from bottom to top):

  • (1) vertical dashed lines are placed first
  • (2) a solid box containing a median line, which covers the dashed box from (1)
  • (3) & (4) solid whisker lines, created by using errorbars with the minima set to the maxima, and vice versa.

I also added custom breaks to match your base R plot, which you can change depending on your needs. panel.border is used to create a thin border in the style of base R. To get the open circles that you want, we use outlier.shape.

The code:

library("ggplot2")

ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot(linetype = "dashed", outlier.shape = 1) +
  stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..), outlier.shape = 1) +
  stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..)) +
  stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..)) +
  scale_y_continuous(breaks = seq(4.5, 8.0, 0.5)) +
  labs(title = "Sepal Variation Across Species",
       x = "Species",
       y = "Sepal Length") +
  theme_classic() + # remove panel background and gridlines
  theme(plot.title = element_text(hjust = 0.5,  # hjust = 0.5 centers the title
                                  size = 14,
                                  face = "bold"),
        panel.border = element_rect(linetype = "solid",
                                    colour = "black", fill = "NA", size = 0.5))

The plot:

enter image description here

Not quite exactly the same, but it seems to be a decent approximation. Hopefully this is close enough for your needs. Good luck, and happy plotting!

like image 156
Marcus Campbell Avatar answered Oct 13 '22 04:10

Marcus Campbell


Building further on what @Marcus & @Moody_Mudskipper has provided:

geom_boxplotMod <- function(mapping = NULL, data = NULL, stat = "boxplot", 
    position = "dodge2", ..., outlier.colour = NULL, outlier.color = NULL, 
    outlier.fill = NULL, outlier.shape = 1, outlier.size = 1.5, 
    outlier.stroke = 0.5, outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5,
    varwidth = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE,
    linetype = "dashed") # to know how these come here use: args(geom_boxplot)
    {
    list(geom_boxplot(
            mapping = mapping, data = data, stat = stat, position = position,
            outlier.colour = outlier.colour, outlier.color = outlier.color, 
            outlier.fill = outlier.fill, outlier.shape = outlier.shape, 
            outlier.size = outlier.size, outlier.stroke = outlier.stroke, 
            outlier.alpha = outlier.alpha, notch = notch, 
            notchwidth = notchwidth, varwidth = varwidth, na.rm = na.rm, 
            show.legend = show.legend, inherit.aes = inherit.aes, linetype = 
            linetype, ...),
        stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..), width = 0.25),
        #the width of the error-bar heads are decreased
        stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..), width = 0.25),
        stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..),
            outlier.shape = 1),
        theme(panel.background = element_blank(),
            panel.border = element_rect(size = 1.5, fill = NA),
            plot.title = element_text(hjust = 0.5),
            axis.title = element_text(size = 12),
            axis.text = element_text(size = 10.5))
        )
    }

library(tidyverse); library(ggplot2);
ggplot(iris, aes(x=Species,y=Sepal.Length, colour = Species)) +
    geom_boxplotMod() +
    ggtitle("Sepal Variation Across Species")

Created on 2020-07-20 by the reprex package (v0.3.0)

like image 21
massisenergy Avatar answered Oct 13 '22 03:10

massisenergy


Here's a wrapper around @Marcus' great solution, for convenient use and more flexibility:

geom_boxplot2 <- function(mapping = NULL, data = NULL, stat = "boxplot", position = "dodge2", 
                          ..., outlier.colour = NULL, outlier.color = NULL, outlier.fill = NULL, 
                          outlier.shape = 1, outlier.size = 1.5, outlier.stroke = 0.5, 
                          outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, varwidth = FALSE, 
                          na.rm = FALSE, show.legend = NA, inherit.aes = TRUE,
                          linetype = "dashed"){
  list(
    geom_boxplot(mapping = mapping, data = data, stat = stat, position = position,
                 outlier.colour = outlier.colour, outlier.color = outlier.color, 
                 outlier.fill = outlier.fill, outlier.shape = outlier.shape, 
                 outlier.size = outlier.size, outlier.stroke = outlier.stroke, 
                 outlier.alpha = outlier.alpha, notch = notch, 
                 notchwidth = notchwidth, varwidth = varwidth, na.rm = na.rm, 
                 show.legend = show.legend, inherit.aes = inherit.aes, 
                 linetype = linetype, ...),
    stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..), outlier.shape = 1) ,
    stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..)) ,
    stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..)) ,
    theme_classic(), # remove panel background and gridlines
    theme(plot.title = element_text(hjust = 0.5,  # hjust = 0.5 centers the title
                                    size = 14,
                                    face = "bold"),
          panel.border = element_rect(linetype = "solid",
                                      colour = "black", fill = "NA", size = 0.5))
  )
}

ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
  geom_boxplot2() +
  scale_y_continuous(breaks = seq(4.5, 8.0, 0.5)) + # not sure how to generalize this
  labs(title = "Sepal Variation Across Species", y = "Sepal Length")
like image 5
Moody_Mudskipper Avatar answered Oct 13 '22 05:10

Moody_Mudskipper