I'm trying to write a custom stat_*
for ggplot2
, where I would like to color a 2D loess surface using tiles. When I start from the extension guide, I can write a stat_chull like they do:
stat_chull = function(mapping = NULL, data = NULL, geom = "polygon",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
chull = ggproto("chull", Stat,
compute_group = function(data, scales) {
data[chull(data$x, data$y), , drop = FALSE]
},
required_aes = c("x", "y")
)
layer(
stat = chull, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
This work for both the simple call and facet wrapping:
ggplot(mpg, aes(x=displ, y=hwy)) +
geom_point() +
stat_chull()
# optionally + facet_wrap(~ class)
When I write my stat_loess2d
, I can also visualize all classes or an individual class:
stat_loess2d = function(mapping = NULL, data = NULL, geom = "tile",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...) {
loess2d = ggproto("loess2d", Stat,
compute_group = function(data, scales) {
dens = MASS::kde2d(data$x, data$y)
lsurf = loess(fill ~ x + y, data=data)
df = data.frame(x = rep(dens$x, length(dens$y)),
y = rep(dens$y, each=length(dens$x)),
dens = c(dens$z))
df$fill = predict(lsurf, newdata=df[c("x", "y")])
df
},
required_aes = c("x", "y", "fill")
)
layer(
stat = loess2d, data = data, mapping = mapping, geom = geom,
position = position, show.legend = show.legend, inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...)
)
}
ggplot(mpg, aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d()
ggplot(mpg[mpg$class == "compact",], aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d()
However, when I try to facet the above, tiles are no longer shown:
ggplot(mpg, aes(x=displ, y=hwy, fill=year)) +
geom_point(aes(color=year)) +
stat_loess2d() +
facet_wrap(~ class)
Can someone tell me what I'm doing wrong here?
The main issue I see here actually lies beyond what you have done, & is related to how geom_tile
handles tile creation across different facets, when the specific x / y axis values differ significantly. An older question demonstrated a similar phenomenon: geom_tile
works fine with each facet's data on its own, but put them together, and the tiles shrink to match the smallest difference between different facets' values. This leaves vast amounts of white space in the plot layer, and usually gets progressively worse with every additional facet, until the tiles themselves become practically invisible.
To get around this, I would add a data processing step after the density / loess calculations for each facet, to standardize the range of x and y values across all facets.
Some elaboration in case you are not very familiar with the relationship between compute_layer
, compute_panel
, and compute_group
(I certainly wasn't when I started messing around with ggproto objects...):
Essentially, all Stat*
objects have these three functions to bridge the gap between a given dataframe (mpg
in this case), and what's received by the Geom*
side of things.
Of the three, compute_layer
is the top-level function, and typically triggers compute_panel
to calculate a separate dataframe for each facet / panel (the terminology used in exported functions is facet, but the underlying package code refers to the same as panel; I'm not sure why either). In turn, compute_panel
triggers compute_group
to calculate a separate dataframe for each group (as defined by the group
/ colour
/ fill
/ etc. aesthetic parameters).
The results from compute_group
are returned to compute_panel
and combined into one dataframe. Likewise, compute_layer
receives one dataframe from each facet's compute_panel
, and combines them together again. The combined dataframe is then passed on to Geom*
to draw.
(Above is the generic set-up as defined in the top-level Stat
. Other Stat*
objects inheriting from Stat
may override the behaviour in any of the steps. For example, StatIdentity
's compute_layer
returns the original dataframe as-is, without triggering compute_panel
/ compute_group
at all, because there is no need to do so for unchanged data.)
For this use case, we can modify the code in compute_layer
, after the results have been returned from compute_panel
/ compute_group
and combined together, to interpolate values associated with each facet into common bins. Because common bins = nice large tiles without white space in between.
Here's how I would have written the loess2d
ggproto object, with an additional definition for compute_layer
:
loess2d = ggproto("loess2d", Stat,
compute_group = function(data, scales) {
dens = MASS::kde2d(data$x, data$y)
lsurf = loess(fill ~ x + y, data=data)
df = data.frame(x = rep(dens$x, length(dens$y)),
y = rep(dens$y, each=length(dens$x)),
dens = c(dens$z))
df$fill = predict(lsurf, newdata=df[c("x", "y")])
df
},
compute_layer = function(self, data, params, layout) {
# no change from Stat$compute_layer in this chunk, except
# for liberal usage of `ggplot2:::` to utilise un-exported
# functions from the package
ggplot2:::check_required_aesthetics(self$required_aes,
c(names(data), names(params)),
ggplot2:::snake_class(self))
data <- remove_missing(data, params$na.rm,
c(self$required_aes, self$non_missing_aes),
ggplot2:::snake_class(self),
finite = TRUE)
params <- params[intersect(names(params), self$parameters())]
args <- c(list(data = quote(data), scales = quote(scales)), params)
df <- plyr::ddply(data, "PANEL", function(data) {
scales <- layout$get_scales(data$PANEL[1])
tryCatch(do.call(self$compute_panel, args),
error = function(e) {
warning("Computation failed in `", ggplot2:::snake_class(self),
"()`:\n", e$message, call. = FALSE)
data.frame()
})
})
# define common x/y grid range across all panels
# (length = 25 chosen to match the default value for n in MASS::kde2d)
x.range <- seq(min(df$x), max(df$x), length = 25)
y.range <- seq(min(df$y), max(df$y), length = 25)
# interpolate each panel's data to a common grid,
# with NA values for regions where each panel doesn't
# have data (this can be changed via the extrap
# parameter in akima::interp, but I think
# extrapolating may create misleading visuals)
df <- df %>%
tidyr::nest(-PANEL) %>%
mutate(data = purrr::map(data,
~akima::interp(x = .x$x, y = .x$y, z = .x$fill,
xo = x.range, yo = y.range,
nx = 25, ny = 25) %>%
akima::interp2xyz(data.frame = TRUE) %>%
rename(fill = z))) %>%
tidyr::unnest()
return(df)
},
required_aes = c("x", "y", "fill")
)
Usage:
ggplot(mpg,
aes(x=displ, y=hwy, fill=year)) +
stat_loess2d() +
facet_wrap(~ class)
# this does trigger warnings (not errors) because some of the facets contain
# really very few observations. if we filter for facets with more rows of data
# in the original dataset, this wouldn't be an issue
ggplot(mpg %>% filter(!class %in% c("2seater", "minivan")),
aes(x=displ, y=hwy, fill=year)) +
stat_loess2d() +
facet_wrap(~ class)
# no warnings triggered
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With