Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill and dodge boxplots by group on a continuous x axis

I have been having issues with what seems to be a simple thing to do: grouped boxplots with a continuous x axis.

Here is come minimal data data:

df <- cbind(expand.grid(x=1:10, rep=1:20, fill=c("A", "B")), y=runif(400))

And here is what I want; you will see I have forced the x axis to be discrete:

ggplot(df, aes(x=as.factor(x), y=y,  fill=fill)) + geom_boxplot()

enter image description here

This is what I get when I leave x as continuous, without a grouping:

ggplot(df, aes(x=x, y=y,  fill=fill)) + geom_boxplot()

enter image description here

When I add a grouping, the color disappears:

 ggplot(df, aes(x=x, y=y, group=x, fill=fill)) + geom_boxplot()

enter image description here

To be clear, what I would want in geom_point would be:

ggplot(df, aes(x=x, y=y, group=x, color=fill)) + geom_point(position=position_dodge(width=.7))

enter image description here

...but if I try to set a dodge in boxplot:

ggplot(df, aes(x=x, y=y, color=fill)) + geom_boxplot(position=position_dodge(width=.7))

enter image description here

Any tips? I have tried searching around: this question addressed continuous boxplots, but without the coloring issue; this question makes me wonder if i need to set an interaction, but doesn't seem to get the desired results. Any help would be hugely appreciated!

like image 797
NWaters Avatar asked Jul 14 '17 10:07

NWaters


People also ask

How do you group data in a Boxplot?

In order to create a box plot by group in R you can pass a formula of the form y ~ x , being x a numerical variable and y a categoriacal variable to the boxplot function. You can pass the variables accessing the data from the data frame using the dollar sign or subsetting the data frame.

Is Boxplot continuous?

A box plot is a graph of the distribution of a continuous variable. The graph is based on the quartiles of the variables. The quartiles divide a set of ordered values into four groups with the same number of observations. The smallest values are in the first quartile and the largest values in the fourth quartiles.

Do box plots need an X axis?

Variables. Box plots are composed of an x-axis and a y-axis. The x-axis assigns one box for each Category or Numeric field variable. The y-axis is used to measure the minimum, first quartile, median, third quartile, and maximum value in a set of numbers.


1 Answers

From ?aes_group_order:

By default, the group is set to the interaction of all discrete variables in the plot.

In your data, you only have one discrete variable, "fill". However, we wish the data to be grouped by both "fill" and "x". Thus, we need to specify the desired grouping using the group argument. And yes, you were correct, interaction is the way to go.

First, a slightly smaller data set (easier to link data to output):

d <- data.frame(x = rep(c(1, 2, 4), each = 8),
                grp = rep(c("a", "b"), each = 4),
                y = sample(24))

Then the plot, where we group data by the different combinations of "x" and "grp" (interaction(x, grp)), and fill the boxes by "grp":

ggplot(d, aes(x = x, y = y, group = interaction(x, grp), fill = grp)) +
  geom_boxplot()

enter image description here

like image 114
Henrik Avatar answered Oct 13 '22 16:10

Henrik