I need to make a lot of boxplots for an upcoming publication. I would like to use ggplot2 because I think it will be more flexible for future projects, but my PI is insisting that I make these plots in the style of base-R. He specifically wants the dashed lines, so that they will appear similar to previous plots we made. I have made an example using the iris dataset to show you, using this code:
plot(iris$Species,
iris$Sepal.Length,
xlab='Species',
ylab='Sepal Length',
main='Sepal Variation Across Species',
col='white')
My question is how to make a similar looking plot using ggplot2?
Here is my attempt:
library("ggplot2")
ggplot(iris) +
geom_boxplot(aes(x=Species,y=Sepal.Length),linetype="dashed") +
ggtitle("Sepal Variation Across Species")
I need the combination of dashed and solid lines, but I cannot make anything work. I have already checked https://stats.stackexchange.com/questions/8137/how-to-add-horizontal-lines-to-ggplot2-boxplot which is very very close but no dashed lines, which we need. Also the outliers are filled circles, which is not the same as base-R.
In ggplot2, geom_boxplot() is used to create a boxplot. Let us first create a regular boxplot, for that we first have to import all the required libraries and dataset in use. Then simply put all the attributes to plot by in ggplot() function along with geom_boxplot.
R. Output: In order to show mean values in boxplot using ggplot2, we use the stat_summary() function to compute new summary statistics and add them to the plot. We use stat_summary() function with ggplot() function.
To reorder the boxplot we will use reorder() function of ggplot2. By default, ggplot2 orders the groups in alphabetical order. But for better visualization of data sometimes we need to reorder them in increasing and decreasing order. This is where the reorder() function comes into play.
Boxplots are created in R by using the boxplot() function.
To generate a "base R style" boxplot using ggplot2, we can layer 4 boxplot objects over top of one another. The order does matter here, so please keep this in mind if you modify the code. I strongly suggest that you explore this code by plotting each boxplot layer on its own; that way you can get a feel for how the different layers interact.
The ordering of the boxplots works like this (ordered from bottom to top):
I also added custom breaks to match your base R plot, which you can change depending on your needs. panel.border
is used to create a thin border in the style of base R. To get the open circles that you want, we use outlier.shape
.
The code:
library("ggplot2")
ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot(linetype = "dashed", outlier.shape = 1) +
stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..), outlier.shape = 1) +
stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..)) +
stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..)) +
scale_y_continuous(breaks = seq(4.5, 8.0, 0.5)) +
labs(title = "Sepal Variation Across Species",
x = "Species",
y = "Sepal Length") +
theme_classic() + # remove panel background and gridlines
theme(plot.title = element_text(hjust = 0.5, # hjust = 0.5 centers the title
size = 14,
face = "bold"),
panel.border = element_rect(linetype = "solid",
colour = "black", fill = "NA", size = 0.5))
The plot:
Not quite exactly the same, but it seems to be a decent approximation. Hopefully this is close enough for your needs. Good luck, and happy plotting!
Building further on what @Marcus & @Moody_Mudskipper has provided:
geom_boxplotMod <- function(mapping = NULL, data = NULL, stat = "boxplot",
position = "dodge2", ..., outlier.colour = NULL, outlier.color = NULL,
outlier.fill = NULL, outlier.shape = 1, outlier.size = 1.5,
outlier.stroke = 0.5, outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5,
varwidth = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE,
linetype = "dashed") # to know how these come here use: args(geom_boxplot)
{
list(geom_boxplot(
mapping = mapping, data = data, stat = stat, position = position,
outlier.colour = outlier.colour, outlier.color = outlier.color,
outlier.fill = outlier.fill, outlier.shape = outlier.shape,
outlier.size = outlier.size, outlier.stroke = outlier.stroke,
outlier.alpha = outlier.alpha, notch = notch,
notchwidth = notchwidth, varwidth = varwidth, na.rm = na.rm,
show.legend = show.legend, inherit.aes = inherit.aes, linetype =
linetype, ...),
stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..), width = 0.25),
#the width of the error-bar heads are decreased
stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..), width = 0.25),
stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..),
outlier.shape = 1),
theme(panel.background = element_blank(),
panel.border = element_rect(size = 1.5, fill = NA),
plot.title = element_text(hjust = 0.5),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10.5))
)
}
library(tidyverse); library(ggplot2);
ggplot(iris, aes(x=Species,y=Sepal.Length, colour = Species)) +
geom_boxplotMod() +
ggtitle("Sepal Variation Across Species")
Created on 2020-07-20 by the reprex package (v0.3.0)
Here's a wrapper around @Marcus' great solution, for convenient use and more flexibility:
geom_boxplot2 <- function(mapping = NULL, data = NULL, stat = "boxplot", position = "dodge2",
..., outlier.colour = NULL, outlier.color = NULL, outlier.fill = NULL,
outlier.shape = 1, outlier.size = 1.5, outlier.stroke = 0.5,
outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, varwidth = FALSE,
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE,
linetype = "dashed"){
list(
geom_boxplot(mapping = mapping, data = data, stat = stat, position = position,
outlier.colour = outlier.colour, outlier.color = outlier.color,
outlier.fill = outlier.fill, outlier.shape = outlier.shape,
outlier.size = outlier.size, outlier.stroke = outlier.stroke,
outlier.alpha = outlier.alpha, notch = notch,
notchwidth = notchwidth, varwidth = varwidth, na.rm = na.rm,
show.legend = show.legend, inherit.aes = inherit.aes,
linetype = linetype, ...),
stat_boxplot(aes(ymin = ..lower.., ymax = ..upper..), outlier.shape = 1) ,
stat_boxplot(geom = "errorbar", aes(ymin = ..ymax..)) ,
stat_boxplot(geom = "errorbar", aes(ymax = ..ymin..)) ,
theme_classic(), # remove panel background and gridlines
theme(plot.title = element_text(hjust = 0.5, # hjust = 0.5 centers the title
size = 14,
face = "bold"),
panel.border = element_rect(linetype = "solid",
colour = "black", fill = "NA", size = 0.5))
)
}
ggplot(data = iris, aes(x = Species, y = Sepal.Length)) +
geom_boxplot2() +
scale_y_continuous(breaks = seq(4.5, 8.0, 0.5)) + # not sure how to generalize this
labs(title = "Sepal Variation Across Species", y = "Sepal Length")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With