Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 custom stat not shown when facetting

Tags:

r

ggplot2

ggproto

I'm trying to write a custom stat_* for ggplot2, where I would like to color a 2D loess surface using tiles. When I start from the extension guide, I can write a stat_chull like they do:

stat_chull = function(mapping = NULL, data = NULL, geom = "polygon",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  chull = ggproto("chull", Stat,
    compute_group = function(data, scales) {
      data[chull(data$x, data$y), , drop = FALSE]
    },
    required_aes = c("x", "y")
  )

  layer(
    stat = chull, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

This work for both the simple call and facet wrapping:

ggplot(mpg, aes(x=displ, y=hwy)) + 
  geom_point() + 
  stat_chull()
# optionally + facet_wrap(~ class)

chull without facetting chull with facetting

When I write my stat_loess2d, I can also visualize all classes or an individual class:

stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile",
                       position = "identity", na.rm = FALSE, show.legend = NA, 
                       inherit.aes = TRUE, ...) {

  loess2d = ggproto("loess2d", Stat,
    compute_group = function(data, scales) {
      dens = MASS::kde2d(data$x, data$y)
      lsurf = loess(fill ~ x + y, data=data)
      df = data.frame(x = rep(dens$x, length(dens$y)),
                      y = rep(dens$y, each=length(dens$x)),
                      dens = c(dens$z))
      df$fill = predict(lsurf, newdata=df[c("x", "y")])
      df
    },
    required_aes = c("x", "y", "fill")
  )

  layer(
    stat = loess2d, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(na.rm = na.rm, ...)
  )
}

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d()

all loess2d one class loess2d

However, when I try to facet the above, tiles are no longer shown:

ggplot(mpg, aes(x=displ, y=hwy, fill=year)) + 
  geom_point(aes(color=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)

loess2d with facetting shows no tiles

Can someone tell me what I'm doing wrong here?

like image 605
Michael Schubert Avatar asked Aug 13 '19 10:08

Michael Schubert


1 Answers

Explanation

The main issue I see here actually lies beyond what you have done, & is related to how geom_tile handles tile creation across different facets, when the specific x / y axis values differ significantly. An older question demonstrated a similar phenomenon: geom_tile works fine with each facet's data on its own, but put them together, and the tiles shrink to match the smallest difference between different facets' values. This leaves vast amounts of white space in the plot layer, and usually gets progressively worse with every additional facet, until the tiles themselves become practically invisible.

To get around this, I would add a data processing step after the density / loess calculations for each facet, to standardize the range of x and y values across all facets.

Some elaboration in case you are not very familiar with the relationship between compute_layer, compute_panel, and compute_group (I certainly wasn't when I started messing around with ggproto objects...):

  • Essentially, all Stat* objects have these three functions to bridge the gap between a given dataframe (mpg in this case), and what's received by the Geom* side of things.

  • Of the three, compute_layer is the top-level function, and typically triggers compute_panel to calculate a separate dataframe for each facet / panel (the terminology used in exported functions is facet, but the underlying package code refers to the same as panel; I'm not sure why either). In turn, compute_panel triggers compute_group to calculate a separate dataframe for each group (as defined by the group / colour / fill / etc. aesthetic parameters).

  • The results from compute_group are returned to compute_panel and combined into one dataframe. Likewise, compute_layer receives one dataframe from each facet's compute_panel, and combines them together again. The combined dataframe is then passed on to Geom* to draw.

(Above is the generic set-up as defined in the top-level Stat. Other Stat* objects inheriting from Stat may override the behaviour in any of the steps. For example, StatIdentity's compute_layer returns the original dataframe as-is, without triggering compute_panel / compute_group at all, because there is no need to do so for unchanged data.)

For this use case, we can modify the code in compute_layer, after the results have been returned from compute_panel / compute_group and combined together, to interpolate values associated with each facet into common bins. Because common bins = nice large tiles without white space in between.

Modification

Here's how I would have written the loess2d ggproto object, with an additional definition for compute_layer:

loess2d = ggproto("loess2d", Stat,
                  compute_group = function(data, scales) {
                    dens = MASS::kde2d(data$x, data$y)
                    lsurf = loess(fill ~ x + y, data=data)
                    df = data.frame(x = rep(dens$x, length(dens$y)),
                                    y = rep(dens$y, each=length(dens$x)),
                                    dens = c(dens$z))
                    df$fill = predict(lsurf, newdata=df[c("x", "y")])
                    df
                  },
                  compute_layer = function(self, data, params, layout) {
                    # no change from Stat$compute_layer in this chunk, except
                    # for liberal usage of `ggplot2:::` to utilise un-exported
                    # functions from the package
                    ggplot2:::check_required_aesthetics(self$required_aes, 
                                                        c(names(data), names(params)), 
                                                        ggplot2:::snake_class(self))
                    data <- remove_missing(data, params$na.rm, 
                                           c(self$required_aes, self$non_missing_aes), 
                                           ggplot2:::snake_class(self),
                                           finite = TRUE)
                    params <- params[intersect(names(params), self$parameters())]
                    args <- c(list(data = quote(data), scales = quote(scales)), params)
                    df <- plyr::ddply(data, "PANEL", function(data) {
                      scales <- layout$get_scales(data$PANEL[1])
                      tryCatch(do.call(self$compute_panel, args), 
                               error = function(e) {
                                 warning("Computation failed in `", ggplot2:::snake_class(self), 
                                         "()`:\n", e$message, call. = FALSE)
                                 data.frame()
                               })
                    })

                    # define common x/y grid range across all panels
                    # (length = 25 chosen to match the default value for n in MASS::kde2d)
                    x.range <- seq(min(df$x), max(df$x), length = 25)
                    y.range <- seq(min(df$y), max(df$y), length = 25)

                    # interpolate each panel's data to a common grid,
                    # with NA values for regions where each panel doesn't
                    # have data (this can be changed via the extrap
                    # parameter in akima::interp, but I think  
                    # extrapolating may create misleading visuals)
                    df <- df %>% 
                      tidyr::nest(-PANEL) %>%
                      mutate(data = purrr::map(data, 
                                               ~akima::interp(x = .x$x, y = .x$y, z = .x$fill,
                                                              xo = x.range, yo = y.range,
                                                              nx = 25, ny = 25) %>%
                                                 akima::interp2xyz(data.frame = TRUE) %>%
                                                 rename(fill = z))) %>%
                      tidyr::unnest()

                    return(df)
                  },
                  required_aes = c("x", "y", "fill")
)

Usage:

ggplot(mpg,
       aes(x=displ, y=hwy, fill=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)
# this does trigger warnings (not errors) because some of the facets contain
# really very few observations. if we filter for facets with more rows of data
# in the original dataset, this wouldn't be an issue

ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")),
       aes(x=displ, y=hwy, fill=year)) + 
  stat_loess2d() +
  facet_wrap(~ class)
# no warnings triggered

results

like image 95
Z.Lin Avatar answered Oct 28 '22 08:10

Z.Lin