I have been having issues with what seems to be a simple thing to do: grouped boxplots with a continuous x axis.
Here is come minimal data data:
df <- cbind(expand.grid(x=1:10, rep=1:20, fill=c("A", "B")), y=runif(400))
And here is what I want; you will see I have forced the x axis to be discrete:
ggplot(df, aes(x=as.factor(x), y=y, fill=fill)) + geom_boxplot()
This is what I get when I leave x
as continuous, without a grouping:
ggplot(df, aes(x=x, y=y, fill=fill)) + geom_boxplot()
When I add a grouping, the color disappears:
ggplot(df, aes(x=x, y=y, group=x, fill=fill)) + geom_boxplot()
To be clear, what I would want in geom_point
would be:
ggplot(df, aes(x=x, y=y, group=x, color=fill)) + geom_point(position=position_dodge(width=.7))
...but if I try to set a dodge in boxplot:
ggplot(df, aes(x=x, y=y, color=fill)) + geom_boxplot(position=position_dodge(width=.7))
Any tips? I have tried searching around: this question addressed continuous boxplots, but without the coloring issue; this question makes me wonder if i need to set an interaction, but doesn't seem to get the desired results. Any help would be hugely appreciated!
In order to create a box plot by group in R you can pass a formula of the form y ~ x , being x a numerical variable and y a categoriacal variable to the boxplot function. You can pass the variables accessing the data from the data frame using the dollar sign or subsetting the data frame.
A box plot is a graph of the distribution of a continuous variable. The graph is based on the quartiles of the variables. The quartiles divide a set of ordered values into four groups with the same number of observations. The smallest values are in the first quartile and the largest values in the fourth quartiles.
Variables. Box plots are composed of an x-axis and a y-axis. The x-axis assigns one box for each Category or Numeric field variable. The y-axis is used to measure the minimum, first quartile, median, third quartile, and maximum value in a set of numbers.
From ?aes_group_order
:
By default, the group is set to the interaction of all discrete variables in the plot.
In your data, you only have one discrete variable, "fill". However, we wish the data to be grouped by both "fill" and "x". Thus, we need to specify the desired grouping using the group
argument. And yes, you were correct, interaction
is the way to go.
First, a slightly smaller data set (easier to link data to output):
d <- data.frame(x = rep(c(1, 2, 4), each = 8),
grp = rep(c("a", "b"), each = 4),
y = sample(24))
Then the plot, where we group data by the different combinations of "x" and "grp" (interaction(x, grp)
), and fill
the boxes by "grp":
ggplot(d, aes(x = x, y = y, group = interaction(x, grp), fill = grp)) +
geom_boxplot()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With