I wanted to see where factor values are turned into numeric ones. I tried to achieve this by simply adding print
statements everywhere...
geom_tile2 <- function(mapping = NULL, data = NULL,
stat = "identity2", position = "identity",
...,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = stat,
geom = GeomTile2,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
...
)
)
}
GeomTile2 <- ggproto("GeomTile2", GeomRect,
extra_params = c("na.rm", "width", "height"),
setup_data = function(data, params) {
print(data)
data$width <- data$width %||% params$width %||% resolution(data$x, FALSE)
data$height <- data$height %||% params$height %||% resolution(data$y, FALSE)
transform(data,
xmin = x - width / 2, xmax = x + width / 2, width = NULL,
ymin = y - height / 2, ymax = y + height / 2, height = NULL
)
},
default_aes = aes(fill = "grey20", colour = NA, size = 0.1, linetype = 1,
alpha = NA),
required_aes = c("x", "y"),
draw_key = draw_key_polygon
)
and
stat_identity2 <- function(mapping = NULL, data = NULL,
geom = "point", position = "identity",
...,
show.legend = NA,
inherit.aes = TRUE) {
layer(
data = data,
mapping = mapping,
stat = StatIdentity2,
geom = geom,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = FALSE,
...
)
)
}
StatIdentity2 <- ggproto("StatIdentity2", Stat,
setup_data = function(data, params) {
print(data)
data
},
compute_layer = function(data, scales, params) {
print(data)
print("stat end")
data
}
)
but when I run e.g.
ggplot(data.frame(x = rep(c("y", "n"), 6), y = rep(c("y", "n"), each = 6)),
aes(x = x, y = y)) +
geom_tile2()
The x
and y
are numeric from the setup_data
function in the stat
and onwards. Looking through the package's Github repo, I just can't seem to find where this conversion to coordinates actually happens?
Scales in ggplot2 control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape. They also provide the tools that let you interpret the plot: the axes and legends.
Aesthetic Mapping ( aes ) In ggplot2 , aesthetic means “something you can see”. Each aesthetic is a mapping between a visual cue and a variable. Examples include: position (i.e., on the x and y axes) color (“outside” color)
You may notice that we sometimes reference 'ggplot2' and sometimes 'ggplot'. To clarify, 'ggplot2' is the name of the most recent version of the package. However, any time we call the function itself, it's just called 'ggplot'.
ggplot2 [library(ggplot2)] ) is a plotting library for R developed by Hadley Wickham, based on Leland Wilkinson's landmark book The Grammar of Graphics ["gg" stands for Grammar of Graphics].
The conversion from factors to numerical scale for x / y is done by the ggplot2:::Layout$map_position()
function, current code here: layout.r
I usually think of the steps involved in creating a plot using ggplot2
package in two stages:
ggplot()
) & all geom_*
/ stat_*
/ facet_*
/ scale_*
/ coord_*
layers added to it are combined into a single ggplot object. If we write something like p <- ggplot(mpg, aes(class)) + geom_bar()
, we stop here. GH code here: plot-construction.r
ggplot_build()
) and further converted into a gtable of grobs (via ggplot_gtable()
). This is usually triggered via the ggplot object's print / plot methods (see here), but we can also use ggplotGrob()
, which returns the converted gtable object directly, minus the printing step. GH code for ggplot_build
/ ggplot_gtable
here: plot-build.r
In my experience, most of the steps we might be interested to tweak are those within the plot rendering stage, and running debug on ggplot2:::ggplot_build.ggplot
/ ggplot2:::ggplot_gtable.ggplot_built
is a good first step to figure out where things happen.
In this case, after running
debugonce(ggplot2:::ggplot_build.ggplot)
ggplot(data.frame(x = rep(c("y", "n"), 6),
y = rep(c("y", "n"), each = 6)),
aes(x = x, y = y)) +
geom_tile() # no need to use the self-defined geom_tile2 here
We begin to step through the function:
> ggplot2:::ggplot_build.ggplot
function (plot)
{
plot <- plot_clone(plot)
if (length(plot$layers) == 0) {
plot <- plot + geom_blank()
}
layers <- plot$layers
layer_data <- lapply(layers, function(y) y$layer_data(plot$data))
scales <- plot$scales
by_layer <- function(f) {
out <- vector("list", length(data))
for (i in seq_along(data)) {
out[[i]] <- f(l = layers[[i]], d = data[[i]])
}
out
}
data <- layer_data
data <- by_layer(function(l, d) l$setup_layer(d, plot))
layout <- create_layout(plot$facet, plot$coordinates)
data <- layout$setup(data, plot$data, plot$plot_env)
data <- by_layer(function(l, d) l$compute_aesthetics(d, plot))
data <- lapply(data, scales_transform_df, scales = scales)
scale_x <- function() scales$get_scales("x")
scale_y <- function() scales$get_scales("y")
layout$train_position(data, scale_x(), scale_y())
data <- layout$map_position(data)
data <- by_layer(function(l, d) l$compute_statistic(d, layout))
data <- by_layer(function(l, d) l$map_statistic(d, plot))
scales_add_missing(plot, c("x", "y"), plot$plot_env)
data <- by_layer(function(l, d) l$compute_geom_1(d))
data <- by_layer(function(l, d) l$compute_position(d, layout))
layout$reset_scales()
layout$train_position(data, scale_x(), scale_y())
layout$setup_panel_params()
data <- layout$map_position(data)
npscales <- scales$non_position_scales()
if (npscales$n() > 0) {
lapply(data, scales_train_df, scales = npscales)
data <- lapply(data, scales_map_df, scales = npscales)
}
data <- by_layer(function(l, d) l$compute_geom_2(d))
data <- by_layer(function(l, d) l$finish_statistics(d))
data <- layout$finish_data(data)
structure(list(data = data, layout = layout, plot = plot),
class = "ggplot_built")
}
In debug mode, we can check str(data[[i]])
after every step, to examine the data associated with layer i
of the ggplot object (i
= 1 in this case, since there's only 1 geom layer).
Browse[2]>
debug: data <- lapply(data, scales_transform_df, scales = scales)
Browse[2]>
debug: scale_x <- function() scales$get_scales("x")
Browse[2]> str(data[[1]]) # still factor after scale_transform_df step
'data.frame': 12 obs. of 4 variables:
$ x : Factor w/ 2 levels "n","y": 2 1 2 1 2 1 2 1 2 1 ...
$ y : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 1 1 1 1 ...
$ PANEL: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
$ group: int 4 2 4 2 4 2 3 1 3 1 ...
..- attr(*, "n")= int 4
# ... omitted
debug: data <- layout$map_position(data)
Browse[2]>
debug: data <- by_layer(function(l, d) l$compute_statistic(d, layout))
Browse[2]> str(data[[1]]) # numerical after map_position step
'data.frame': 12 obs. of 4 variables:
$ x : int 2 1 2 1 2 1 2 1 2 1 ...
$ y : int 2 2 2 2 2 2 1 1 1 1 ...
$ PANEL: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
$ group: int 4 2 4 2 4 2 3 1 3 1 ...
..- attr(*, "n")= int 4
Stat*
's setup_data
is triggered by data <- by_layer(function(l, d) l$compute_statistic(d, layout))
(see ggplot2:::Layer$compute_statistic
here), which happens after this step. This is why when you insert a print statement in StatIdentity2$setup_data
, the data is already in numerical form.
(And Geom*
's setup_data
is triggered by data <- by_layer(function(l, d) l$compute_geom_1(d))
, which happens even later.)
After identifying map_position
as the step where things happen, we can run debug mode again & step into this function to see exactly what's going on. At this point, I'm afraid I don't really know what your use case is, so I'll have to leave you to it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With