I have a dataset which includes data from 100 simulations of train runs in a network with 4 trains, 6 stations and lateness at arrival for each train at each station. My data looks something like this:
MyData <- data.frame(
Simulation = rep(sort(rep(1:100, 6)), 4),
Train_number = sort(rep(c(100, 102, 104, 106), 100*6)),
Stations = rep(c("ST_1", "ST_2", "ST_3", "ST_4", "ST_5", "ST_6"), 100*4),
Arrival_Lateness = c(rep(0, 60), rexp(40, 1), rep(0, 60), rexp(40, 2), rep(0, 60), rexp(40, 3), rep(0, 60), rexp(40, 5))
)
I now create boxplots for each train and station with custom quantiles (thanks to jlhoward):
f <- function(x) {
r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
r
}
ggplot(MyData, aes(factor(Stations), Arrival_Lateness, fill = factor(Train_number))) +
stat_summary(fun.data = f, geom="boxplot", position="dodge")
Very pretty:
What I am missing now is outliers. I would like to plot top 5% of observations for each train/station combination on tom of each boxplot. What I tried is this (inspired by this question):
q <- function(x) {
subset(x, quantile(x, 0.95) < x)
}
ggplot(MyData, aes(factor(Stations), Arrival_Lateness, fill = factor(Train_number))) +
stat_summary(fun.data = f, geom="boxplot", position="dodge") +
stat_summary(fun.y = q, geom="point", position="dodge")
I get a message: "ymax not defined: adjusting position using y instead" and my chart looks like this:
which is clearly not what I wanted.
This?
ggplot(MyData, aes(factor(Stations), Arrival_Lateness,
fill = factor(Train_number))) +
stat_summary(fun.data = f, geom="boxplot",
position=position_dodge(1))+
stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point",
position=position_dodge(1))
IMHO this is a little easier to interpret.
ggplot(MyData, aes(factor(Train_number), Arrival_Lateness,
fill = factor(Train_number))) +
stat_summary(fun.data = f, geom="boxplot",
position=position_dodge(1))+
stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point",
position=position_dodge(1))+
facet_grid(.~Stations, scales="free")+
theme(axis.text.x=element_text(angle=-90,hjust=1,vjust=0.2))+
labs(x="Train Number")
EDIT (Response to OP's comment)
ggplot(MyData, aes(factor(Train_number), Arrival_Lateness,
fill = factor(Train_number))) +
stat_summary(fun.data = f, geom="boxplot",
position=position_dodge(1))+
stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point",
position=position_dodge(1))+
facet_grid(.~Stations, scales="free")+
theme(axis.text.x=element_blank(), axis.ticks.x=element_blank())+
scale_fill_discrete("Train")+scale_color_discrete("Train")+
labs(x="")
To turn off x-axis text and tick marks, us theme(...=element_blank())
. To turn off the axis label, use labs(x="")
. Also, the fill and color scales have to have the same name, or they display separately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With