I'm often using boxplots in my work and like ggplot2
aesthetics. But standard geom_boxplot
lacks two things important for me: ends of whiskers and median labels. Thanks to information from here I've written a function:
gBoxplot <- function(formula = NULL, data = NULL, font = "CMU Serif", fsize = 18){
require(ggplot2)
vars <- all.vars(formula)
response <- vars[1]
factor <- vars[2]
# A function for medians labelling
fun_med <- function(x){
return(data.frame(y = median(x), label = round(median(x), 3)))
}
p <- ggplot(data, aes_string(x = factor, y = response)) +
stat_boxplot(geom = "errorbar", width = 0.6) +
geom_boxplot() +
stat_summary(fun.data = fun_med, geom = "label", family = font, size = fsize/3,
vjust = -0.1) +
theme_grey(base_size = fsize, base_family = font)
return(p)
}
There are also font settings, but this is just because I'm too lazy to make a theme. Here is an example:
gBoxplot(hwy ~ class, mpg)
Good enough for me, but there are some restrictictions (cannot use auto-dodging, etc.), and it will be better to make a new geom based on geom_boxplot
. I've read the vignette Extending ggplot2, but cannot understand how to implement it.
Any help will be appreciated.
So been thinking about this one for a while. Basically when you create a new primitive, you normally write a combination of:
Only the layer-function need be visible to the user. You only need to write a stat-ggproto if you need some new way of transforming your data to make your primitive. And you only need write a geom-ggproto if you have some new grid-based graphics to create.
In this case, where we are basically composting layer-function that already exist, we don’t really need to write new ggprotos. It is enough to write a new layer-function. This layer-function will create the three layers that you already are using and map the parameters the way you intend. In this case:
geom_errorbar
and stat_boxplot
– to get our errorbarsgeom_boxplot
and stat_boxplot
- to create the boxplotsgeom_label
and stat_summary
- to create the text labels with the mean value in the center of the boxes.Of course you could write a new stat-ggproto and a new geom-ggproto that do all of these things at once. Or maybe you compost stat_summary
and stat_boxplot
into one, and the three geom-protos as well, and this do this with one layer. But there is little point unless we have efficiency problems.
Anyway, here is the code:
geom_myboxplot <- function(formula = NULL, data = NULL,
stat = "boxplot", position = "dodge",coef=1.5,
font = "sans", fsize = 18, width=0.6,
fun.data = NULL, fun.y = NULL, fun.ymax = NULL,
fun.ymin = NULL, fun.args = list(),
outlier.colour = NULL, outlier.color = NULL,
outlier.shape = 19, outlier.size = 1.5,outlier.stroke = 0.5,
notch = FALSE, notchwidth = 0.5,varwidth = FALSE,
na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE,...) {
vars <- all.vars(formula)
response <- vars[1]
factor <- vars[2]
mymap <- aes_string(x=factor,y=response)
fun_med <- function(x) {
return(data.frame(y = median(x), label = round(median(x), 3)))
}
position <- position_dodge(width)
l1 <- layer(data = data, mapping = mymap, stat = StatBoxplot,
geom = "errorbar", position = position, show.legend = show.legend,
inherit.aes = inherit.aes, params = list(na.rm = na.rm,
coef = coef, width = width, ...))
l2 <- layer(data = data, mapping = mymap, stat = stat, geom = GeomBoxplot,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(outlier.colour = outlier.colour, outlier.shape = outlier.shape,
outlier.size = outlier.size, outlier.stroke = outlier.stroke,
notch = notch, notchwidth = notchwidth, varwidth = varwidth,
na.rm = na.rm, ...))
l3 <- layer(data = data, mapping = mymap, stat = StatSummary,
geom = "label", position = position, show.legend = show.legend,
inherit.aes = inherit.aes, params = list(fun.data = fun_med,
fun.y = fun.y, fun.ymax = fun.ymax, fun.ymin = fun.ymin,
fun.args = fun.args, na.rm=na.rm,family=font,size=fsize/3,vjust=-0.1,...))
return(list(l1,l2,l3))
}
which allows you to create your customized boxplots it now like this:
ggplot(mpg) +
geom_myboxplot( hwy ~ class, font = "sans",fsize = 18)+
theme_grey(base_family = "sans",base_size = 18 )
And they look like this:
Note: we did not actually have to use the layer
function, we could have used the orginal stat_boxplot
, geom_boxplot
, and stat_summary
calls in their place. But we still would have had to fill in all the parameters if we wanted to be able to control them from our custom boxplot, so I think it was clearer this way - at least from the point-of-view of structure as opposed to functionality. Maybe it isn't though, it is a matter of taste...
Also I don't have that font which does look a lot nicer. But I did not feel like tracking it down and installing it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With