Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 - where are the scales being built?

Tags:

r

ggplot2

ggproto

I wanted to see where factor values are turned into numeric ones. I tried to achieve this by simply adding print statements everywhere...

geom_tile2 <- function(mapping = NULL, data = NULL,
                      stat = "identity2", position = "identity",
                      ...,
                      na.rm = FALSE,
                      show.legend = NA,
                      inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomTile2,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
      ...
    )
  )
}

GeomTile2 <- ggproto("GeomTile2", GeomRect,
  extra_params = c("na.rm", "width", "height"),

  setup_data = function(data, params) {
    print(data)

    data$width <- data$width %||% params$width %||% resolution(data$x, FALSE)
    data$height <- data$height %||% params$height %||% resolution(data$y, FALSE)

    transform(data,
              xmin = x - width / 2,  xmax = x + width / 2,  width = NULL,
              ymin = y - height / 2, ymax = y + height / 2, height = NULL
    )
  },

  default_aes = aes(fill = "grey20", colour = NA, size = 0.1, linetype = 1,
                    alpha = NA),

  required_aes = c("x", "y"),

  draw_key = draw_key_polygon
)

and

stat_identity2 <- function(mapping = NULL, data = NULL,
                          geom = "point", position = "identity",
                          ...,
                          show.legend = NA,
                          inherit.aes = TRUE) {
  layer(
    data = data,
    mapping = mapping,
    stat = StatIdentity2,
    geom = geom,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = FALSE,
      ...
    )
  )
}

StatIdentity2 <- ggproto("StatIdentity2", Stat,

  setup_data = function(data, params) {
    print(data)
    data
  },
  compute_layer = function(data, scales, params) {
    print(data)
    print("stat end")
    data
  }
)

but when I run e.g.

ggplot(data.frame(x = rep(c("y", "n"), 6), y = rep(c("y", "n"), each = 6)), 
       aes(x = x, y = y)) + 
  geom_tile2()

The x and y are numeric from the setup_data function in the stat and onwards. Looking through the package's Github repo, I just can't seem to find where this conversion to coordinates actually happens?

like image 368
eok Avatar asked Apr 08 '18 13:04

eok


People also ask

Are scales part of ggplot2?

Scales in ggplot2 control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape. They also provide the tools that let you interpret the plot: the axes and legends.

What is the AES in the ggplot2 system?

Aesthetic Mapping ( aes ) In ggplot2 , aesthetic means “something you can see”. Each aesthetic is a mapping between a visual cue and a variable. Examples include: position (i.e., on the x and y axes) color (“outside” color)

Is ggplot2 different from ggplot?

You may notice that we sometimes reference 'ggplot2' and sometimes 'ggplot'. To clarify, 'ggplot2' is the name of the most recent version of the package. However, any time we call the function itself, it's just called 'ggplot'.

What does the GG stand for in ggplot2?

ggplot2 [library(ggplot2)] ) is a plotting library for R developed by Hadley Wickham, based on Leland Wilkinson's landmark book The Grammar of Graphics ["gg" stands for Grammar of Graphics].


1 Answers

TL;DR

The conversion from factors to numerical scale for x / y is done by the ggplot2:::Layout$map_position() function, current code here: layout.r

Long explanation

I usually think of the steps involved in creating a plot using ggplot2 package in two stages:

  1. Plot construction. This is when a new ggplot object (initialized via ggplot()) & all geom_* / stat_* / facet_* / scale_* / coord_* layers added to it are combined into a single ggplot object. If we write something like p <- ggplot(mpg, aes(class)) + geom_bar(), we stop here. GH code here: plot-construction.r
  2. Plot rendering. This is when the combined ggplot object is converted into an object that can be rendered (via ggplot_build()) and further converted into a gtable of grobs (via ggplot_gtable()). This is usually triggered via the ggplot object's print / plot methods (see here), but we can also use ggplotGrob(), which returns the converted gtable object directly, minus the printing step. GH code for ggplot_build / ggplot_gtable here: plot-build.r

In my experience, most of the steps we might be interested to tweak are those within the plot rendering stage, and running debug on ggplot2:::ggplot_build.ggplot / ggplot2:::ggplot_gtable.ggplot_built is a good first step to figure out where things happen.

In this case, after running

debugonce(ggplot2:::ggplot_build.ggplot)

ggplot(data.frame(x = rep(c("y", "n"), 6), 
                  y = rep(c("y", "n"), each = 6)), 
       aes(x = x, y = y)) + 
  geom_tile() # no need to use the self-defined geom_tile2 here

We begin to step through the function:

> ggplot2:::ggplot_build.ggplot
function (plot) 
{
    plot <- plot_clone(plot)
    if (length(plot$layers) == 0) {
        plot <- plot + geom_blank()
    }
    layers <- plot$layers
    layer_data <- lapply(layers, function(y) y$layer_data(plot$data))
    scales <- plot$scales
    by_layer <- function(f) {
        out <- vector("list", length(data))
        for (i in seq_along(data)) {
            out[[i]] <- f(l = layers[[i]], d = data[[i]])
        }
        out
    }
    data <- layer_data
    data <- by_layer(function(l, d) l$setup_layer(d, plot))
    layout <- create_layout(plot$facet, plot$coordinates)
    data <- layout$setup(data, plot$data, plot$plot_env)
    data <- by_layer(function(l, d) l$compute_aesthetics(d, plot))
    data <- lapply(data, scales_transform_df, scales = scales)
    scale_x <- function() scales$get_scales("x")
    scale_y <- function() scales$get_scales("y")
    layout$train_position(data, scale_x(), scale_y())
    data <- layout$map_position(data)
    data <- by_layer(function(l, d) l$compute_statistic(d, layout))
    data <- by_layer(function(l, d) l$map_statistic(d, plot))
    scales_add_missing(plot, c("x", "y"), plot$plot_env)
    data <- by_layer(function(l, d) l$compute_geom_1(d))
    data <- by_layer(function(l, d) l$compute_position(d, layout))
    layout$reset_scales()
    layout$train_position(data, scale_x(), scale_y())
    layout$setup_panel_params()
    data <- layout$map_position(data)
    npscales <- scales$non_position_scales()
    if (npscales$n() > 0) {
        lapply(data, scales_train_df, scales = npscales)
        data <- lapply(data, scales_map_df, scales = npscales)
    }
    data <- by_layer(function(l, d) l$compute_geom_2(d))
    data <- by_layer(function(l, d) l$finish_statistics(d))
    data <- layout$finish_data(data)
    structure(list(data = data, layout = layout, plot = plot), 
        class = "ggplot_built")
}

In debug mode, we can check str(data[[i]]) after every step, to examine the data associated with layer i of the ggplot object (i = 1 in this case, since there's only 1 geom layer).

Browse[2]> 
debug: data <- lapply(data, scales_transform_df, scales = scales)
Browse[2]> 
debug: scale_x <- function() scales$get_scales("x")
Browse[2]> str(data[[1]]) # still factor after scale_transform_df step
'data.frame':   12 obs. of  4 variables:
 $ x    : Factor w/ 2 levels "n","y": 2 1 2 1 2 1 2 1 2 1 ...
 $ y    : Factor w/ 2 levels "n","y": 2 2 2 2 2 2 1 1 1 1 ...
 $ PANEL: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ group: int  4 2 4 2 4 2 3 1 3 1 ...
  ..- attr(*, "n")= int 4

# ... omitted

debug: data <- layout$map_position(data)
Browse[2]> 
debug: data <- by_layer(function(l, d) l$compute_statistic(d, layout))
Browse[2]> str(data[[1]]) # numerical after map_position step
'data.frame':   12 obs. of  4 variables:
 $ x    : int  2 1 2 1 2 1 2 1 2 1 ...
 $ y    : int  2 2 2 2 2 2 1 1 1 1 ...
 $ PANEL: Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
 $ group: int  4 2 4 2 4 2 3 1 3 1 ...
  ..- attr(*, "n")= int 4

Stat*'s setup_data is triggered by data <- by_layer(function(l, d) l$compute_statistic(d, layout)) (see ggplot2:::Layer$compute_statistic here), which happens after this step. This is why when you insert a print statement in StatIdentity2$setup_data, the data is already in numerical form.

(And Geom*'s setup_data is triggered by data <- by_layer(function(l, d) l$compute_geom_1(d)), which happens even later.)

After identifying map_position as the step where things happen, we can run debug mode again & step into this function to see exactly what's going on. At this point, I'm afraid I don't really know what your use case is, so I'll have to leave you to it.

like image 52
Z.Lin Avatar answered Oct 27 '22 11:10

Z.Lin