I am trying to show the median value(i.e the horizontal bar) in the a box plot by using ggplot(). R keeps asking to specify y axis. I am a bit stuck.
p <-structure(list(TYPE = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 3L, 3L, 3L, 1L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 1L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L), .Label = c("PM BUSINESS",
"PM CONSUMER", "PREPAY"), class = "factor"), TOTALREV = c(52.13,
53.01, 396.49, 596.52, 0, 33.89, 183.43, 0, 174.67, 120.56, 619.29,
171, 142.2, 77.14, 16.69, 176.86, 103.79, 799.8, 137.84, 187.84,
201.05, 16.69, 154.95, 195.98, 17.07, 158.96, 166.86, 8.89, 434.59,
34.55, 196.97, 783.74, 216.27, 1533.98, 137.6, 52.22, 88.61,
69.52, 52.18, 368.22, 139.89, 214.22, 163.46, 295.49, 319.73,
933.91, 199.19, 118.72, 0, 174.99, 141.72, 52.12, 115.25, 106.57,
106.12, 153.84, 1.45, 4.32, 168.93, 34.76, 249.21, 101.25, 87.69,
20.62, 0.87, 17.39, 0, 34.5, 131.36, 0, 106.43, 257.45, 0, 0,
256.63, 466.93, 44.25, 339.15, 71.42, 270.81, 145.85, 670.52,
187.06, 170.61, 153.59, 21.69, 166.14, 97, 104.4, 517.19, 230.78,
14.11, 52.33, 398.61, 56.65, 0, 26.02, 0, 154.78, 154.78)), .Names = c("TYPE",
"TOTALREV"), row.names = 23961:24060, class = "data.frame")
x6.3 <- qplot(TYPE, TOTALREV, data =p, geom = "boxplot")
x6.3+ stat_bin(geom="text",aes(x=TYPE,y=TOTALREV,label=TOTALREV),size = 3, hjust = 0.5, vjust = -1.5,position ="identity")
A boxplot summarizes the distribution of a continuous variable and notably displays the median of each group.
R. Output: In order to show mean values in boxplot using ggplot2, we use the stat_summary() function to compute new summary statistics and add them to the plot. We use stat_summary() function with ggplot() function.
The left and right sides of the box are the lower and upper quartiles. The box covers the interquartile interval, where 50% of the data is found. The vertical line that split the box in two is the median. Sometimes, the mean is also indicated by a dot or a cross on the box plot.
The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into two parts. Half the scores are greater than or equal to this value and half are less. The middle “box” represents the middle 50% of scores for the group.
I don't think that stat_bin
is the way to go:
library(plyr)
library(ggplot2)
p_meds <- ddply(p, .(TYPE), summarise, med = median(TOTALREV))
ggplot(p,aes(x = TYPE, y = TOTALREV)) +
geom_boxplot() +
geom_text(data = p_meds, aes(x = TYPE, y = med, label = med),
size = 3, vjust = -1.5)
As a side note, I generally find that people are less confused with ggplot once they learn to stop using qplot
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With