Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a multiple column facet function

I am trying to create a facet_multi_col() function, similar to the facet_col() function in ggforce - that allows for a facet layout with a space argument (which is not available in facet_wrap()) - but over multiple columns. As in the last plot below (created with grid.arrange()) I do not want the facets to necessarily align across rows as the heights in each facet will vary based on a categorical y variable that I wish to use.

I am finding myself well out of my depth with ggproto having read the extension guide. I think the best approach is to pass a layout matrix to dictate where to break columns for corresponding subsets of the data, and to build off facet_col in ggforce to include a space parameter - see the end of the question.

A quick illustration of my unsatisfactory options

No facet

library(tidyverse)
library(gapminder)
global_tile <- ggplot(data = gapminder, mapping = aes(x = year, y = fct_rev(country), fill = lifeExp)) +
  geom_tile()
global_tile

enter image description here I want to break down the plot by continents. I do not want such a long figure.

facet_wrap()

global_tile +
  facet_wrap(facets = "continent", scales = "free")

enter image description here facet_wrap() does not have a space argument which means the tiles are different sizes in each continent, using coord_equal() throws an error

facet_col() in ggforce

library(ggforce)
global_tile +
  facet_col(facets = "continent", scales = "free", space = "free", strip.position = "right") +
  theme(strip.text.y = element_text(angle = 0)) 

enter image description here Like the strips on the side. space argument sets all tiles to the same size. Still too long to fit on to a page.

grid.arrange() in gridExtra

Add a column column to data for where each continent should be placed

d <- gapminder %>%
  as_tibble() %>%
  mutate(col = as.numeric(continent), 
         col = ifelse(test = continent == "Europe", yes = 2, no = col),
         col = ifelse(test = continent == "Oceania", yes = 3, no = col))
head(d)
# # A tibble: 6 x 7
#   country     continent  year lifeExp      pop gdpPercap   col
#   <fct>       <fct>     <int>   <dbl>    <int>     <dbl> <dbl>
# 1 Afghanistan Asia       1952    28.8  8425333      779.     3
# 2 Afghanistan Asia       1957    30.3  9240934      821.     3
# 3 Afghanistan Asia       1962    32.0 10267083      853.     3
# 4 Afghanistan Asia       1967    34.0 11537966      836.     3
# 5 Afghanistan Asia       1972    36.1 13079460      740.     3
# 6 Afghanistan Asia       1977    38.4 14880372      786.     3
tail(d)
# # A tibble: 6 x 7
#   country  continent  year lifeExp      pop gdpPercap   col
#   <fct>    <fct>     <int>   <dbl>    <int>     <dbl> <dbl>
# 1 Zimbabwe Africa     1982    60.4  7636524      789.     1
# 2 Zimbabwe Africa     1987    62.4  9216418      706.     1
# 3 Zimbabwe Africa     1992    60.4 10704340      693.     1
# 4 Zimbabwe Africa     1997    46.8 11404948      792.     1
# 5 Zimbabwe Africa     2002    40.0 11926563      672.     1
# 6 Zimbabwe Africa     2007    43.5 12311143      470.     1

Use facet_col() for plot for each column

g <- list()
for(i in unique(d$col)){
  g[[i]] <- d %>%
    filter(col == i) %>%
    ggplot(mapping = aes(x = year, y = fct_rev(country), fill = lifeExp)) +
    geom_tile() +
    facet_col(facets = "continent", scales = "free_y", space = "free", strip.position = "right") +
    theme(strip.text.y = element_text(angle = 0)) +
    # aviod legends in every column
    guides(fill = FALSE) +
    labs(x = "", y = "")
}

Create a legend using get_legend() in cowplot

library(cowplot)
gg <- ggplot(data = d, mapping = aes(x = year, y = country, fill = lifeExp)) +
  geom_tile()
leg <- get_legend(gg)

Create a layout matrix with heights based on number of countries in each column.

m <- 
  d %>%
  group_by(col) %>%
  summarise(row = n_distinct(country)) %>%
  rowwise() %>%
  mutate(row = paste(1:row, collapse = ",")) %>%
  separate_rows(row) %>%
  mutate(row = as.numeric(row), 
         col = col, 
         p = col) %>% 
  xtabs(formula = p ~ row + col) %>%
  cbind(max(d$col) + 1) %>%
  ifelse(. == 0, NA, .)

head(m)
#   1 2 3  
# 1 1 2 3 4
# 2 1 2 3 4
# 3 1 2 3 4
# 4 1 2 3 4
# 5 1 2 3 4
# 6 1 2 3 4

tail(m)
#     1 2  3  
# 50  1 2 NA 4
# 51  1 2 NA 4
# 52  1 2 NA 4
# 53 NA 2 NA 4
# 54 NA 2 NA 4
# 55 NA 2 NA 4

Bring g and leg together using grid.arrange() in gridExtra

library(gridExtra)
grid.arrange(g[[1]], g[[2]], g[[3]], leg, layout_matrix = m, widths=c(0.32, 0.32, 0.32, 0.06))

enter image description here This is almost what I am after, but I am not satisfied as a) the tiles in different columns have different widths as the length of longest country and continent names are not equal and b) its a lot of code that needs to be tweaked each time I want to make a plot like this - with other data I want to arrange the facets by regions, e.g. "Western Europe" rather than continents or the number of countries changes - there are no Central Asian countries in the gapminder data.

Progress with creating a facet_multi_cols() function

I want to pass a layout matrix to a facet function, where the matrix would refer to each facet, and the function could then figure out the heights based on the number of spaces in each panel. For the above example the matrix would be:

my_layout <- matrix(c(1, NA, 2, 3, 4, 5), nrow = 2)
my_layout
#      [,1] [,2] [,3]
# [1,]    1    2    4
# [2,]   NA    3    5

As mentioned above, I have been adapting from the code in facet_col() to try and build a facet_multi_col() function. I have added a layout argument to provide matrix such as my_layout above, with the idea that, for example, the fourth and fifth level of the variable given to the facets argument is plotted in the third column.

facet_multi_col <- function(facets, layout, scales = "fixed", space = "fixed",
                      shrink = TRUE, labeller = "label_value",
                      drop = TRUE, strip.position = 'top') {
  # add space argument as in facet_col
  space <- match.arg(space, c('free', 'fixed'))
  facet <- facet_wrap(facets, col = col, dir = dir, scales = scales, shrink = shrink, labeller = labeller, drop = drop, strip.position = strip.position)
  params <- facet$params
  params <- facet$layout

  params$space_free <- space == 'free'
  ggproto(NULL, FacetMultiCols, shrink = shrink, params = params)
}

FacetMultiCols <- ggproto('FacetMultiCols', FacetWrap,
  # from FacetCols to allow for space argument to work
  draw_panels = function(self, panels, layout, x_scales, y_scales, ranges, coord, data, theme, params) {
    combined <- ggproto_parent(FacetWrap, self)$draw_panels(panels, layout, x_scales, y_scales, ranges, coord, data, theme, params)
    if (params$space_free) {
      widths <- vapply(layout$PANEL, function(i) diff(ranges[[i]]$x.range), numeric(1))
      panel_widths <- unit(widths, "null")
      combined$widths[panel_cols(combined)$l] <- panel_widths
    }
    combined
  }
  # adapt FacetWrap layout to set position on panels following the matrix given to layout in facet_multi_col().
  compute_layout = function(self, panels, layout, x_scales, y_scales, ranges, coord, data, theme, params) {
    layout <- ggproto_parent(FacetWrap, self)$compute_layout(panels, layout, x_scales, y_scales, ranges, coord, data, theme, params)
    # ???
)

I think I need to write something for the compute_layout part, but I am struggling to figure out how to do this.

like image 927
guyabel Avatar asked Nov 02 '19 02:11

guyabel


2 Answers

Disclaimer

I have never developed any facet, but I found the question interesting and enough of a challenge, so I gave it a try. It is not perfect yet and by far not tested with all subtleties which may occur depending on your plot, but it is a first draft from which you can work on.

Idea

facet_wrap sets out the panels in a table and each row has a certain height, which the panel fully occupies. gtable_add_grob says:

In the gtable model, grobs always fill up the complete table cell. If you want custom justification you might need to define the grob dimension in absolute units, or put it into another gtable that can then be added to the gtable instead of the grob.

This could be an interesting solution. However, I was not sure how to pursue that. Thus, I took a different approach:

  1. Create a custom layout, based on the layout parameter passed
  2. Let facet_wrap render all panels w.r.t. to the layout
  3. Use gtable_filter to grab the panel including its axes and strips
  4. Create a layout matrix. I tried 2 approaches: using a minimum number of rows and playing with height differences. And simply adding approximately as many rows as there are ticks on the y-axis. Both work similiar, the latter produces cleaner code, so I would use this one.
  5. Use gridExtra::arrangeGrob to arrange the panels according to the passed design and the created layout matrix

Results

The full code is a bit lengthy, but can be found below. Here are some graphs:

my_layout1 <- matrix(c(1, NA, 2, 3, 4, 5), nrow = 2)
my_layout2 <- matrix(c(1, 2, 3, 4, 5, NA), ncol = 2)

## Ex1
global_tile + facet_multi_col("continent", my_layout1, scales = "free_y", 
                              space = "free", strip.position = "top")

## Ex 2
global_tile + facet_multi_col("continent", my_layout1, scales = "free_y", 
                              space = "free", strip.position = "right")

## Ex 3 - shows that we need a minimum space for any plot 
global_tile + facet_multi_col("continent", my_layout1, scales = "free_y", 
                              space = "free", strip.position = "top", min_prop = 0)

## Ex 4
global_tile + facet_multi_col("continent", my_layout1, scales = "free_y", 
                              space = "fixed", strip.position = "right")

## Ex 5
global_tile + facet_multi_col("continent", my_layout2, scales = "free_y", 
                              space = "free")

Ex 1 Example 1 Ex 2 Example 2 Ex 3 Example 3 Ex 4 Example 4 Ex 5 Example 5

Restrictions

The code is far from being foolproof. Some issues I see already:

  • We (silently) assume that each column in the design starts with a non NA value (in general for a productive code, the passed layout needs to be checked carefully (do the dimensions fit? are there as many entries as panels? etc.)
  • Very small panels do not render well, so I had to add an minimum value for the height depending on the position of the strips
  • The effect of moving or adding axes or strips is not yet tested.

Code: one row per tick

## get strip and axis of a given panel
## Assumptions:
## - axis are adjacent to the panel, that is exactly +1/-1 positions to the t/b/l/r ...
## - ... unless there is a strip then it is +2/-2 
get_whole_panel <- function(panel_name,
                            table_layout) {
  target <- table_layout$layout %>%
    dplyr::filter(name == panel_name) %>%
    dplyr::select(row = t, col = l)
  stopifnot(NROW(target) == 1)
  pos <- unlist(target)
  dirs <- list(t = c(-1, 0),
               b = c(1, 0),
               l = c(0, -1),
               r = c(0, 1))
  filter_elems <- function(dir, 
                           type = c("axis", "strip")) {
    type <- match.arg(type)
    new_pos <- pos + dir
    res <- table_layout$layout %>%
      dplyr::filter(grepl(type, name),
                    l == new_pos["col"],
                    t == new_pos["row"]) %>%
      dplyr::pull(name)
    if (length(res)) res else NA
  }
  strip <- purrr::map_chr(dirs, filter_elems, type = "strip")
  strip <- strip[!is.na(strip)]
  dirs[[names(strip)]] <- 2 * dirs[[names(strip)]]
  axes  <- purrr::map_chr(dirs, filter_elems, type = "axis")
  gtable::gtable_filter(table_layout, paste(c(panel_name, axes, strip), collapse = "|"))
}


facet_multi_col <- function(facets, layout, scales = "fixed", space = "fixed",
                            shrink = TRUE, labeller = "label_value",
                            drop = TRUE, strip.position = "top", 
                            min_prop = ifelse(strip.position %in% c("top", "bottom"), 
                                              0.12, 0.1)) {
  space <- match.arg(space, c("free", "fixed"))
  if (space == "free") {
    ## if we ask for free space we need scales everywhere, so make sure they are included
    scales <- "free"
  }
  facet <- facet_wrap(facets, ncol = 1, scales = scales, shrink = shrink, 
                      labeller = labeller, drop = drop, strip.position = strip.position)
  params <- facet$params
  params$space_free <- space == "free"
  params$layout <- layout
  params$parent <- facet
  params$min_prop <- min_prop
  ggproto(NULL, FacetMultiCol, shrink = shrink, params = params)
}



render <- function(self, panels, layout, 
                   x_scales, y_scales, ranges, 
                   coord, data, theme, params) {
  combined <- ggproto_parent(FacetWrap, self)$draw_panels(panels, layout, 
                                                          x_scales, y_scales, ranges, 
                                                          coord, data, theme, params)
  if (params$space_free) {
    panel_names <- combined$layout$name
    panels <- lapply(panel_names[grepl("panel", panel_names)],
                     get_whole_panel,
                     table_layout = combined)

    ## remove zeroGrob panels
    zG <- sapply(panels, function(tg) all(sapply(tg$grobs, ggplot2:::is.zero)))
    panels <- panels[!zG]
    ## calculate height for each panel
    heights <- matrix(NA, NROW(params$layout), NCOL(params$layout))
    ## store the rounded range in the matrix cell corresponding to its position
    ## allow for a minimum space in dependence of the overall number of rows to
    ## render small panels well

    heights[as.matrix(layout[, c("ROW", "COL")])] <- vapply(ranges, function(r) 
      round(diff(r$y.range), 0), numeric(1))

    ## 12% should be the minimum height used by any panel if strip is on top otherwise 10%
    ## these values are empirical and can be changed
    min_height <- round(params$min_prop * max(colSums(heights, TRUE)), 0)
    heights[heights < min_height] <- min_height
    idx <- c(heights)
    idx[!is.na(idx)] <- seq_along(idx[!is.na(idx)])
    len_out <- max(colSums(heights, TRUE))
    i <- 0
    layout_matrix <- apply(heights, 2, function(col) {
      res <- unlist(lapply(col, function(n) {
        i <<- i + 1
        mark <- idx[i]
        if (is.na(n)) {
          NA
        } else {
          rep(mark, n)
        }
      }))
      len <- length(res)
      if (len < len_out) {
        res <- c(res, rep(NA, len_out - len))
      }
      res
    })

    ## set width of left axis to maximum width to align plots
    max_width <- max(do.call(grid::unit.c, lapply(panels, function(gt) gt$widths[1])))
    panels <- lapply(panels, function(p) {
      p$widths[1] <- max_width
      p
    })

    combined <- gridExtra::arrangeGrob(grobs = panels,
                            layout_matrix = layout_matrix,
                            as.table = FALSE)
    ## add name, such that find_panel can find the plotting area
    combined$layout$name <- paste("panel_", layout$LAB)
  }
  combined
}

layout <- function(data, params) {
  parent_layout <- params$parent$compute_layout(data, params)
  msg <- paste0("invalid ",
                sQuote("layout"),
                ". Falling back to ",
                sQuote("facet_wrap"),
                " layout")
  if (is.null(params$layout) ||
      !is.matrix(params$layout)) {
    warning(msg)
    parent_layout
  } else {
    ## smash layout into vector and remove NAs all done by sort
    layout <- params$layout
    panel_numbers <- sort(layout)
    if (!isTRUE(all.equal(sort(as.numeric(as.character(parent_layout$PANEL))),
                          panel_numbers))) {
      warning(msg)
      parent_layout
    } else {
      ## all good
      indices <- cbind(ROW = c(row(layout)),
                       COL = c(col(layout)),
                       PANEL = c(layout))
      indices <- indices[!is.na(indices[, "PANEL"]), ]
      ## delete row and col number from parent layout
      parent_layout$ROW <- parent_layout$COL <- NULL
      new_layout <- merge(parent_layout, 
                          indices,
                          by = "PANEL") %>%
        dplyr::arrange(PANEL)
      new_layout$PANEL <- factor(new_layout$PANEL)
      labs <- new_layout %>%
        dplyr::select(-PANEL,
                      -SCALE_X,
                      -SCALE_Y,
                      -ROW,
                      -COL) %>%
        dplyr::mutate(sep = "_") %>%
        do.call(paste, .)
      new_layout$LAB <- labs
      new_layout


    }
  }
}

FacetMultiCol <- ggproto("FacetMultiCol", FacetWrap,
                         compute_layout = layout,
                         draw_panels    = render)

Code: rows with different heights

## get strip and axis of a given panel
## Assumptions:
## - axis are adjacent to the panel, that is exactly +1/-1 positions to the t/b/l/r ...
## - ... unless there is a strip then it is +2/-2 
get_whole_panel <- function(panel_name,
                            table_layout) {
  target <- table_layout$layout %>%
    dplyr::filter(name == panel_name) %>%
    dplyr::select(row = t, col = l)
  stopifnot(NROW(target) == 1)
  pos <- unlist(target)
  dirs <- list(t = c(-1, 0),
               b = c(1, 0),
               l = c(0, -1),
               r = c(0, 1))
  filter_elems <- function(dir, 
                           type = c("axis", "strip")) {
    type <- match.arg(type)
    new_pos <- pos + dir
    res <- table_layout$layout %>%
      dplyr::filter(grepl(type, name),
                    l == new_pos["col"],
                    t == new_pos["row"]) %>%
      dplyr::pull(name)
    if (length(res)) res else NA
  }
  strip <- purrr::map_chr(dirs, filter_elems, type = "strip")
  strip <- strip[!is.na(strip)]
  dirs[[names(strip)]] <- 2 * dirs[[names(strip)]]
  axes  <- purrr::map_chr(dirs, filter_elems, type = "axis")
  gtable::gtable_filter(table_layout, paste(c(panel_name, axes, strip), collapse = "|"))
}


facet_multi_col <- function(facets, layout, scales = "fixed", space = "fixed",
                            shrink = TRUE, labeller = "label_value",
                            drop = TRUE, strip.position = "top") {
  space <- match.arg(space, c("free", "fixed"))
  if (space == "free") {
    ## if we ask for free space we need scales everywhere, so make sure they are included
    scales <- "free"
  }
  facet <- facet_wrap(facets, ncol = 1, scales = scales, shrink = shrink, 
                      labeller = labeller, drop = drop, strip.position = strip.position)
  params <- facet$params
  params$space_free <- space == "free"
  params$layout <- layout
  params$parent <- facet
  ggproto(NULL, FacetMultiCol, shrink = shrink, params = params)
}



render <- function(self, panels, layout, 
                   x_scales, y_scales, ranges, 
                   coord, data, theme, params) {
  combined <- ggproto_parent(FacetWrap, self)$draw_panels(panels, layout, 
                                                          x_scales, y_scales, ranges, 
                                                          coord, data, theme, params)
  if (params$space_free) {
    panel_names <- combined$layout$name
    panels <- lapply(panel_names[grepl("panel", panel_names)],
                     get_whole_panel,
                     table_layout = combined)

    ## remove zeroGrob panels
    zG <- sapply(panels, function(tg) all(sapply(tg$grobs, ggplot2:::is.zero)))
    panels <- panels[!zG]

    ## calculate height for each panel
    heights <- matrix(NA, NROW(params$layout), NCOL(params$layout))
    ## need to add a minimum height as otherwise the space is too narrow
    heights[as.matrix(layout[, c("ROW", "COL")])] <- vapply(layout$PANEL, function(i) 
      max(diff(ranges[[i]]$y.range), 8), numeric(1))
    heights_cum <- sort(unique(unlist(apply(heights, 2, 
                                            function(col) cumsum(col[!is.na(col)])))))
    heights_units <- unit(c(heights_cum[1], diff(heights_cum)), "null")

    ## set width of left axis to maximum width to align plots
    max_width <- max(do.call(grid::unit.c, lapply(panels, function(gt) gt$widths[1])))
    panels <- lapply(panels, function(p) {
      p$widths[1] <- max_width
      p
    })

    mark <- 0

    ## create layout matrix
    layout_matrix <- apply(heights, 2, function(h) {
      idx <- match(cumsum(h),
              cumsum(c(heights_units)))
      idx <- idx[!is.na(idx)]
      res <- unlist(purrr::imap(idx, function(len_out, pos) {
        mark <<- mark + 1
        offset <- if (pos != 1) idx[pos - 1] else 0
          rep(mark, len_out - offset)
      }))
      len_out <- length(res)
      if (len_out < length(heights_units)) {
        res <- c(res, rep(NA, length(heights_units) - len_out)) 
      }
      res
    }) 

    combined <- gridExtra::arrangeGrob(grobs = panels,
                                layout_matrix = layout_matrix,
                                heights = heights_units,
                                as.table = FALSE)
    ## add name, such that find_panel can find the plotting area
    combined$layout$name <- paste("panel_", layout$LAB)
  }
  combined
}

layout <- function(data, params) {
  parent_layout <- params$parent$compute_layout(data, params)
  msg <- paste0("invalid ",
                sQuote("layout"),
                ". Falling back to ",
                sQuote("facet_wrap"),
                " layout")
  if (is.null(params$layout) ||
      !is.matrix(params$layout)) {
    warning(msg)
    parent_layout
  } else {
    ## smash layout into vector and remove NAs all done by sort
    layout <- params$layout
    panel_numbers <- sort(layout)
    if (!isTRUE(all.equal(sort(as.numeric(as.character(parent_layout$PANEL))),
                          panel_numbers))) {
      warning(msg)
      parent_layout
    } else {
      ## all good
      indices <- cbind(ROW = c(row(layout)),
                       COL = c(col(layout)),
                       PANEL = c(layout))
      indices <- indices[!is.na(indices[, "PANEL"]), ]
      ## delete row and col number from parent layout
      parent_layout$ROW <- parent_layout$COL <- NULL
      new_layout <- merge(parent_layout, 
                          indices,
                          by = "PANEL") %>%
        dplyr::arrange(PANEL)
      new_layout$PANEL <- factor(new_layout$PANEL)
      labs <- new_layout %>%
        dplyr::select(-PANEL,
                      -SCALE_X,
                      -SCALE_Y,
                      -ROW,
                      -COL) %>%
        dplyr::mutate(sep = "_") %>%
        do.call(paste, .)
      new_layout$LAB <- labs
      new_layout


    }
  }
}

FacetMultiCol <- ggproto("FacetMultiCol", FacetWrap,
                         compute_layout = layout,
                         draw_panels    = render)
like image 88
thothal Avatar answered Oct 07 '22 08:10

thothal


As suggested in the comments, a combination of cowplot and patchwork can get you fairly far. See my solution below.

The basic idea is:

  • to first calculate a scaling factor, based on the number of rows,
  • then make a series of single column grids, where I use empty plots to constrain the height of the plots with the calulated scaling factor. (and remove the legends)
  • then I add these into a grid and also add a legend.
  • in the beginning, I also calculate a maximum for the fill scale.
library(tidyverse)
library(gapminder)
library(patchwork)
max_life <- max(gapminder$lifeExp)
generate_plot <- function(data, title){
  ggplot(data = data, mapping = aes(x = year, y = fct_rev(country), fill = lifeExp)) +
    geom_tile()+
    scale_fill_continuous(limits = c(0, max_life)) +
    ggtitle(title)
}
scale_plot <- function(plot, ratio){
  plot + theme(legend.position="none") + 
    plot_spacer() + 
    plot_layout(ncol = 1,
                heights = c(
                  ratio,
                  1-ratio
                )
    )
}
df <- gapminder %>% 
  group_by(continent) %>% 
  nest() %>% 
  ungroup() %>% 
  arrange(continent) %>% 
  mutate(
    rows = map_dbl(data, nrow),
    rel_height = (rows/max(rows)),
    plot = map2(
      data,
      continent,
      generate_plot
    ),
    spaced_plot = map2(
      plot,
      rel_height,
      scale_plot
        )
  )
wrap_plots(df$spaced_plot) + cowplot::get_legend(df$plot[[1]])

Created on 2019-11-06 by the reprex package (v0.3.0)

like image 20
Benjamin Schwetz Avatar answered Oct 07 '22 07:10

Benjamin Schwetz