Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining means on a boxplot with a line (ggplot2)

Tags:

I have a boxplot showing multiple boxes. I want to connect the mean for each box together with a line. The boxplot does not display the mean by default, instead the middle line only indicates the median. I tried

ggplot(data, aes(x=xData, y=yData, group=g))      + geom_boxplot()      + stat_summary(fun.y=mean, geom="line") 

This does not work.

Interestingly enough, doing

stat_summary(fun.y=mean, geom="point")  

draws the median point in each box. Why would "line" not work?

Something like this but using ggplot2, http://www.aliquote.org/articles/tech/RMB/c4_sols/plot45.png

like image 209
nixbox Avatar asked Oct 21 '10 17:10

nixbox


People also ask

How do you make a boxplot with a line connecting a value in R?

To create a boxplot with a line Connecting mean values in R we use the overlapping approach of ggplot2. We first create the simple ggplot2 boxplot. Then we take the mean values of data values from the data frame and store them in vector mean.

How do you show a mean in a boxplot in R?

R. Output: In order to show mean values in boxplot using ggplot2, we use the stat_summary() function to compute new summary statistics and add them to the plot. We use stat_summary() function with ggplot() function.

What do Ggplot Boxplots show?

The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.


1 Answers

Is that what you are looking for?

library(ggplot2)  x <- factor(rep(1:10, 100)) y <- rnorm(1000) df <- data.frame(x=x, y=y)  ggplot(df, aes(x=x, y=y)) +  geom_boxplot() +  stat_summary(fun=mean, geom="line", aes(group=1))  +  stat_summary(fun=mean, geom="point") 

Update:

Some clarification about setting group=1: I think that I found an explanation in Hadley Wickham's book "ggplot2: Elegant Graphics for Data Analysis. On page 51 he writes:

Different groups on different layers.

Sometimes we want to plot summaries based on different levels of aggregation. Different layers might have different group aesthetics, so that some display individual level data while others display summaries of larger groups.

Building on the previous example, suppose we want to add a single smooth line to the plot just created, based on the ages and heights of all the boys. If we use the same grouping for the smooth that we used for the line, we get the first plot in Figure 4.4.

p + geom_smooth(aes(group = Subject), method="lm", se = F)

This is not what we wanted; we have inadvertently added a smoothed line for each boy. This new layer needs a different group aesthetic, group = 1, so that the new line will be based on all the data, as shown in the second plot in the figure. The modified layer looks like this:

p + geom_smooth(aes(group = 1), method="lm", size = 2, se = F)

[...] Using aes(group = 1) in the smooth layer fits a single line of best fit across all boys."

like image 83
Bernd Weiss Avatar answered Oct 18 '22 21:10

Bernd Weiss