I am doing a basic boxplot where y=age
and x=Patient groups
age <- ggplot(data, aes(factor(group2), age)) + ylim(15, 80) age + geom_boxplot(fill = "grey80", colour = "#3366FF")
I was hoping you could help me out with a few things:
1) Is it possible to include a number of observations per group above each group boxplot (but NOT on the X axis where my group labels are) without having to do this in paint :)? I have tried using:
age + annotate("text", x = "CON", y = 60, label = "25")
where CON
is the 1st group and y = 60
is ~ just above the boxplot for this group. However, the command didn't work. I assume it has something to do that it reads x
as a continuous rather than a categorical variable.
2) Also although there are plenty of questions about using the mean rather than the median for the boxplots, I still haven`t found a code that works for me?
3) On the same matter is there a way you could include the mean group stat in the boxplot? Perhaps using
age + stat_summary(fun.y=mean, colour="red", geom="point")
which however only includes a dot of where the mean lies. Or again using
age + annotate("text", x = "CON", y = 30, label = "30")
where CON
is the 1st group and y = 30
is ~ the group age mean. Knowing how flexible and rich ggplot2
syntax is I was hoping that there is a more elegant way of using the real stats output rather than annotate
.
Any suggestions/links would be much appreciated!
Thanks!!
R. Output: In order to show mean values in boxplot using ggplot2, we use the stat_summary() function to compute new summary statistics and add them to the plot. We use stat_summary() function with ggplot() function.
Box plot for multiple groups In order to create a box plot by group in R you can pass a formula of the form y ~ x , being x a numerical variable and y a categoriacal variable to the boxplot function. You can pass the variables accessing the data from the data frame using the dollar sign or subsetting the data frame.
A boxplot summarizes the distribution of a continuous variable and notably displays the median of each group.
You need to pass the data you used to create your box plot, set the "jitter" method to add random noise over the data points, avoiding overplotting, set the desired aesthetics arguments such as pch or col and add = TRUE so the points are added over the previous plot.
Is this anything like what you're after? With stat_summary
, as requested:
# function for number of observations give.n <- function(x){ return(c(y = median(x)*1.05, label = length(x))) # experiment with the multiplier to find the perfect position } # function for mean labels mean.n <- function(x){ return(c(y = median(x)*0.97, label = round(mean(x),2))) # experiment with the multiplier to find the perfect position } # plot ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) + geom_boxplot(fill = "grey80", colour = "#3366FF") + stat_summary(fun.data = give.n, geom = "text", fun.y = median) + stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red")
Black number is number of observations, red number is mean value. joran's answer shows you how to put the numbers at the top of the boxes
hat-tip: https://stackoverflow.com/a/3483657/1036500
I think this is what you're looking for maybe?
myboxplot <- ddply(mtcars, .(cyl), summarise, min = min(mpg), q1 = quantile(mpg,0.25), med = median(mpg), q3 = quantile(mpg,0.75), max= max(mpg), lab = length(cyl)) ggplot(myboxplot, aes(x = factor(cyl))) + geom_boxplot(aes(lower = q1, upper = q3, middle = med, ymin = min, ymax = max), stat = "identity") + geom_text(aes(y = max,label = lab),vjust = 0)
I just realized I mistakenly used the median when you were asking about the mean, but you can obviously use whatever function for the middle
aesthetic you please.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With