Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - ggplot2 'dodge' geom_step() to overlap geom_bar()

Plotting counts using ggplot2's geom_bar(stat="identity") is an effective method of visualising counts. I would like to use this method to display my observed counts and compare them to expected counts I would like to do this by using geom_step to overlay a stairstep plot layer over the barplot.

However when I do this I run into the problem that barplots by default have their positions dodged but geom_step does not. For example using both continuous and discrete dependent variables:

library(tidyverse)

test <- data_frame(a = 1:10, b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red')

test2 <- data_frame(a = letters[1:10], b = runif(10, 1, 10))

test2_plot <- ggplot(test2, aes(a, b, group = 1)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red'))

gridExtra::grid.arrange(test_plot, test2_plot, ncol = 2)

enter image description here

As you can see the two layers are offset which is undesirable.

Reading the docs I see that geom_path has a position = option however trying something like geom_step(color = 'red', position = position_dodge(width = 0.5)) does not do what I want rather it compresses the bars and the stairstep line towards the centre. Another option is to adjust the data directly like this geom_step(aes(a-0.5, b), color = 'red') which produces a near acceptable result for data with continuous dependent variables. You could also calculate the stairstep line as a function and plot it using stat_function().

enter image description here

However these approaches are not applicable to data with discrete dependent variables and my actual data has discrete dependent variables so I need another answer.

Additionally when shifted the stairstep line will not cover the last bar as seen in the above image. Is there an easy elegant way to extend it to cover the last bar?

If geom_step() is the wrong approach and what I'm trying to get can be achieved in another way I am interested in that too.

like image 359
G_T Avatar asked Apr 16 '17 07:04

G_T


3 Answers

I think the most efficient way to solve this problem is to define custom geom in the following way:

library(tidyverse)

geom_step_extend <- function(data, extend = 1, nudge = -0.5,
                             ...) {
  # Function for computing the last segment data
  get_step_extend_data <- function(data, extend = 1, nudge = -0.5) {
    data_out <- as.data.frame(data[order(data[[1]]), ])
    n <- nrow(data)
    max_x_y <- data_out[n, 2]
    if (is.numeric(data_out[[1]])) {
      max_x <- data_out[n, 1] + nudge
    } else {
      max_x <- n + nudge
    }

    data.frame(x = max_x,
               y = max_x_y,
               xend = max_x + extend,
               yend = max_x_y)
  }

  # The resulting geom
  list(
    geom_step(position = position_nudge(x = nudge), ...),
    geom_segment(
      data = get_step_extend_data(data, extend = extend, nudge = nudge),
      mapping = aes(x = x, y = y,
                    xend = xend, yend = yend),
      ...
    )
  )
}

set.seed(111)
test <- data_frame(a = 1:10, b = runif(10, 1, 10))
test2 <- data_frame(a = letters[1:10], b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b, group = 1)) + 
  geom_bar(stat = "identity") + 
  geom_step_extend(data = test, colour = "red")

test2_plot <- ggplot(test2, aes(a, b, group = 1)) + 
  geom_bar(stat = "identity") + 
  geom_step_extend(data = test2, colour = "red")

gridExtra::grid.arrange(test_plot, test2_plot, ncol = 2)

Example_output

Basically this solution consists from three parts:

  1. Nudge to the left with position_nudge the step curve by desired value (in this case -0.5);
  2. Compute the absent (the one on the right) segment data with function get_step_extend_data. Its behaviour is inspired from ggplot2:::stairstep which is the underlying function of geom_step;
  3. Compose geom_step with geom_segment in separate geom with list.
like image 66
echasnovski Avatar answered Oct 23 '22 17:10

echasnovski


Here's a rather crude solution, but should work in this case.

Create an alternate data frame that expanded each line to extend the x-axis by -0.5 and 0.5:

test2 <- data.frame(a = lapply(1:nrow(test), function(x) c(test[x,"a"]-.5, test[x,"a"], test[x, "a"]+0.5)) %>% unlist, 
                b = lapply(1:nrow(test), function(x) rep(test[x,"b"], 3)) %>% unlist)

Plot the outline with geom_line argument:

ggplot(test, aes(a,b)) + geom_bar(stat="identity", alpha=.7) + geom_line(data=test2, colour="red")

enter image description here

This will look tidier if you set the geom_bar width to 1:

ggplot(test, aes(a,b)) + geom_bar(width=1, stat="identity", alpha=.7) + geom_line(data=test2, colour="red")

enter image description here

like image 29
Adam Quek Avatar answered Oct 23 '22 15:10

Adam Quek


Since ggplot2 version 3.3.0 this is option is now supported by geom_step using direction = "mid":

library(tidyverse)

test <- data_frame(a = 1:10, b = runif(10, 1, 10))

test_plot <- ggplot(test, aes(a, b)) + 
  geom_bar(stat="identity") + 
  geom_step(color = 'red', direction = "mid", size = 2)

test_plot

enter image description here

like image 38
Molx Avatar answered Oct 23 '22 15:10

Molx