ggplot2 boxplot medians aren't plotting as expected

Tags:

So, I have a fairly large dataset (Dropbox: csv file) that I'm trying to plot using geom_boxplot. The following produces what appears to be a reasonable plot:

require(reshape2)
require(ggplot2)
require(scales)
require(grid)
require(gridExtra)

df <- read.csv("\\Downloads\\boxplot.csv", na.strings = "*")
df$year <- factor(df$year, levels = c(2010,2011,2012,2013,2014), labels = c(2010,2011,2012,2013,2014))

d <- ggplot(data = df, aes(x = year, y = value)) +
    geom_boxplot(aes(fill = station)) + 
    facet_grid(station~.) +
    scale_y_continuous(limits = c(0, 15)) + 
    theme(legend.position = "none"))
d

However, when you dig a little deeper, problems creep in that freak me out. When I labeled the boxplot medians with their values, the following plot results.

df.m <- aggregate(value~year+station, data = df, FUN = function(x) median(x))
d <- d + geom_text(data = df.m, aes(x = year, y = value, label = value)) 
d

boxplots-with-medians-labelled

The medians plotted by geom_boxplot aren't at the medians at all. The labels are plotted at the correct y-axis value, but the middle hinge of the boxplots are definitely not at the medians. I've been stumped by this for a few days now.

What is the reason for this? How can this type of display be produced with correct medians? How can this plot be debugged or diagnosed?

635

asked Mar 27 '15 15:03

Ryan Pugh

1 Answers

The solution to this question is in the application of scale_y_continuous. ggplot2 will perform operations in the following order:

Scale Transformations
Statistical Computations
Coordinate Transformations

In this case, because a scale transformation is invoked, ggplot2 excludes data outside the scale limits for the statistical computation of the boxplot hinges. The medians calculated by the aggregate function and used in the geom_text instruction will use the entire dataset, however. This can result in different median hinges and text labels.

The solution is to omit the scale_y_continuous instruction and instead use:

d <- ggplot(data = df, aes(x = year, y = value)) +
geom_boxplot(aes(fill = station)) + 
facet_grid(station~.) +
theme(legend.position = "none")) +
coord_cartesian(y = c(0,15))

This allows ggplot2 to calculate the boxplot hinge stats using the entire dataset, while limiting the plot size of the figure.

124

answered Oct 03 '22 11:10

Ryan Pugh

Related questions
                            
                                Converting simple ggplot2 code to use data.table
                            
                                how to edit or modify or change a single line in a large text file with R
                            
                                NaN is removed when using na.rm=TRUE
                            
                                Align edges of ggplot choropleth (legend title varies)
                            
                                rapply to nested list of data frames in R
                            
                                prevent knitr/Rmarkdown from interleaving chunk output with code
                            
                                `geom_line()` connects points mapped to different groups
                            
                                Adding a counter column for a set of similar rows in R [duplicate]
                            
                                Adding principal components as variables to a data frame
                            
                                R :Plot and save in a pdf file
                            
                                GGally - unexpected behavior with ggpairs(..., diag = list( continuous = 'density'))
                            
                                How do I reinstall a base-R package (e.g., stats, graphics, utils, etc.)?
                            
                                fread() fails with missing values in integer64 columns
                            
                                splice in a bquote in R
                            
                                Replace entire strings based on partial match
                            
                                I can't generate \label{fig:mwe-plot} with knitr
                            
                                Dodging points and error bars with ggplot
                            
                                How to end a header 3 box in rmarkdown beamer madrid presentation?
                            
                                NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?
                            
                                R: ggvis - gray background (as ggplot2)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ggplot2 boxplot medians aren't plotting as expected

Tags:

r

ggplot2

median

boxplot

Ryan Pugh

People also ask

1 Answers

Ryan Pugh

Recent Activity

Donate For Us