Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to display the median value in a boxplot in ggplot?

Tags:

r

ggplot2

I am trying to show the median value(i.e the horizontal bar) in the a box plot by using ggplot(). R keeps asking to specify y axis. I am a bit stuck.

p <-structure(list(TYPE = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 2L, 3L, 3L, 3L, 1L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 1L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("PM BUSINESS", 
"PM CONSUMER", "PREPAY"), class = "factor"), TOTALREV = c(52.13, 
53.01, 396.49, 596.52, 0, 33.89, 183.43, 0, 174.67, 120.56, 619.29, 
171, 142.2, 77.14, 16.69, 176.86, 103.79, 799.8, 137.84, 187.84, 
201.05, 16.69, 154.95, 195.98, 17.07, 158.96, 166.86, 8.89, 434.59, 
34.55, 196.97, 783.74, 216.27, 1533.98, 137.6, 52.22, 88.61, 
69.52, 52.18, 368.22, 139.89, 214.22, 163.46, 295.49, 319.73, 
933.91, 199.19, 118.72, 0, 174.99, 141.72, 52.12, 115.25, 106.57, 
106.12, 153.84, 1.45, 4.32, 168.93, 34.76, 249.21, 101.25, 87.69, 
20.62, 0.87, 17.39, 0, 34.5, 131.36, 0, 106.43, 257.45, 0, 0, 
256.63, 466.93, 44.25, 339.15, 71.42, 270.81, 145.85, 670.52, 
187.06, 170.61, 153.59, 21.69, 166.14, 97, 104.4, 517.19, 230.78, 
14.11, 52.33, 398.61, 56.65, 0, 26.02, 0, 154.78, 154.78)), .Names = c("TYPE", 
"TOTALREV"), row.names = 23961:24060, class = "data.frame")

x6.3 <- qplot(TYPE, TOTALREV, data =p, geom = "boxplot")
x6.3+ stat_bin(geom="text",aes(x=TYPE,y=TOTALREV,label=TOTALREV),size = 3, hjust = 0.5, vjust = -1.5,position ="identity")
like image 369
Luo Lei Avatar asked Nov 13 '12 22:11

Luo Lei


People also ask

Does Ggplot boxplot show median or mean?

A boxplot summarizes the distribution of a continuous variable and notably displays the median of each group.

How do you show the mean on a boxplot in R?

R. Output: In order to show mean values in boxplot using ggplot2, we use the stat_summary() function to compute new summary statistics and add them to the plot. We use stat_summary() function with ggplot() function.

Does boxplot show mean or median?

The left and right sides of the box are the lower and upper quartiles. The box covers the interquartile interval, where 50% of the data is found. The vertical line that split the box in two is the median. Sometimes, the mean is also indicated by a dot or a cross on the box plot.

What is the median in a boxplot?

The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less. The middle “box” represents the middle 50% of scores for the group.


1 Answers

I don't think that stat_bin is the way to go:

library(plyr)
library(ggplot2)

p_meds <- ddply(p, .(TYPE), summarise, med = median(TOTALREV))

ggplot(p,aes(x = TYPE, y = TOTALREV)) + 
    geom_boxplot() + 
    geom_text(data = p_meds, aes(x = TYPE, y = med, label = med), 
              size = 3, vjust = -1.5)

enter image description here

As a side note, I generally find that people are less confused with ggplot once they learn to stop using qplot.

like image 100
joran Avatar answered Sep 20 '22 06:09

joran