Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly sort facet boxplots by median?

Tags:

r

ggplot2

I'm using the 'diamonds' dataset that comes with R. When trying to sort the 'color' factor with respect to their price median it won't work.

This is what I got:

ggplot(diamonds, aes(x = reorder(color, -price, FUN=median), y = price)) + 
  geom_boxplot() + 
  facet_wrap(~cut) + 
  ylim(0, 5500)

And it gives me that (not sorted at all): enter image description here

Is there something I'm doing wrong or missing?

like image 627
filipetrm Avatar asked Dec 19 '22 03:12

filipetrm


1 Answers

Here is a relatively simple way of achieving the requested arrangement using two helper function available here

reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) {
  new_x <- paste(x, within, sep = sep)
  stats::reorder(new_x, by, FUN = fun)
}


scale_x_reordered <- function(..., sep = "___") {
  reg <- paste0(sep, ".+$")
  ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)
}

library(tidyverse)
data(diamonds)

p <- ggplot(diamonds, aes(x = reorder_within(color, price, cut, median), y = price)) + 
  geom_boxplot(width = 5) + 
  scale_x_reordered()+
  facet_wrap(~cut,  scales = "free_x")

enter image description here

using ylim(0, 5500) will remove a big part of the data resulting in different box plots which will interfere with any formerly defined order. If you wish to limit an axis without doing so it is better to use:

p + coord_cartesian(ylim = c(0, 5500))

this results in:

enter image description here

If you really intend to remove a big part of data and keep the arrangement, filter the data prior the plot:

diamonds %>%
  filter(price < 5500) %>%
  ggplot(aes(x = reorder_within(color, price, cut, median), y = price)) + 
  geom_boxplot(width = 5) + 
  scale_x_reordered()+
  facet_wrap(~cut,  scales = "free_x")

enter image description here

like image 95
missuse Avatar answered Jan 10 '23 06:01

missuse