Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dual axis plot with automatically calculated sec_axis formula in ggplot2

Tags:

r

ggplot2

I need to write a function that would allow me to quickly do a dual axis plot using ggplot2. I know that dual axis plots are generally deprecated, but still I think it may be useful if you're after observing similar patterns in time series (for all of those who disagree, please treat this question strictly technically). It is actually possible with sec_axis() function from ggplot2, but it needs a defined formula. So here's my attempt to calculate this automatically:

dual_plot <- function(data, x, y_left, y_right){
  x <- ensym(x)
  y_left <- ensym(y_left)
  y_right <- ensym(y_right)

  ratio_model <- lm(eval(y_left) ~ eval(y_right), data = data)

  data %>% 
    select(!!x, !!y_left, !!y_right) %>% 
    mutate(!!y_right := predict(ratio_model)) %>% 
    gather(k, v, -!!x) %>% 
    ggplot() + 
    geom_line(aes(!!x, v, colour = k)) +
    scale_y_continuous(sec.axis = sec_axis(~ . / ratio_model$coefficients[[2]] -
                                             ratio_model$coefficients[[1]],
                                           name = rlang::as_string(y_right))) + 
    labs(y = rlang::as_string(y_left))
}

However, lm may fit a negative direction coefficient which reverse the trend and is really misleading. So I need another approach to calculating this formula - either using linear regression with coefficient constrain or a clever way of fitting a formula. How can it be done in R? Or what are the alternatives to sec_axis that would allow to draw dual axis plot automatically?

@Edit: One example would be:

df <- structure(list(date = structure(c(17167, 17168, 17169, 17170, 
17171, 17172, 17173, 17174, 17175, 17176, 17177, 17178, 17179, 
17180, 17181), class = "Date"), y_right = c(-107073.90734625, 
-633197.630546488, -474626.43291613, -306006.801458608, 56062.072352192, 
522580.236751187, 942796.389093215, -101845.73678439, -632658.677118481, 
-479257.088784885, -303439.231633988, 50273.2477880417, 521669.062954895, 
948127.92455586, -107073.90734625), y_left = c(1648808.16, 3152543.07, 
2702739.91, 2382616.25, 1606089.88, 1592465.75, 1537283.99, 2507221.61, 
3049076.19, 3125424.4, 2774215.1, 2356412.98, 1856506.41, 1477195.08, 
2485713.2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-15L))

df %>% 
  dual_plot(date, y_left, y_right)

enter image description here

The calculated ratio model has direction coefficient of -1.02, so the y_right is reversed (where the function is decreasing, the plotted function is increasing and the other way around) and hence misleading.

like image 912
jakes Avatar asked Jun 03 '19 11:06

jakes


1 Answers

The ggplot capability to allow secondary axes (from version 2.2 on) is mostly a labelling benefit. You still have to project your secondary data onto the proper range. I think the easiest and safest way to accomplish that is a min-max transformation, using the ranges to:

  1. Project the second series onto the range of the first to plot the points.
  2. Project back the other way for the labels.

Note there are many ways for this to be misleading in its own way, including the fact that it will use the full range for the secondary variable even if it definitely shouldn't. Take care.

Simple code without function

df %>%
  select(date, y_left, y_right) %>%
  mutate(y_right = scales::rescale(y_right, to=range(df$y_left))) %>%
  gather(key, value, -date) %>%
  ggplot() +
  geom_line(aes(x = date, y = value, color = key)) +
  scale_y_continuous(sec.axis = sec_axis(~ scales::rescale(., to=range(df$y_right)),
                              name = "Right side")) +
  labs(y = "Left side",
       color = "Series")

Dynamic code using tidyeval function

I've tried to conserve your code, and focus on the use of scales::rescale to project from one range to another.

library(scales)
library(tidyverse)

dual_plot <- function(data, x, y_left, y_right) {
  x <- ensym(x)
  y_left <- ensym(y_left)
  y_right <- ensym(y_right)

  # Introducing ranges
  left_range <- range(data %>% pull(!!y_left))
  right_range <- range(data %>% pull(!!y_right))

  data %>%
    select(!!x, !!y_left, !!y_right) %>%
    # Transform
    mutate(!!y_right := scales::rescale(!!y_right, to=left_range)) %>%
    gather(k, v, -!!x) %>%
    ggplot() +
    geom_line(aes(!!x, v, colour = k)) +
    # Change secondary axis scaling and label
    scale_y_continuous(sec.axis = sec_axis(~ scales::rescale(., to=right_range),
                                           name = rlang::as_string(y_right))) +
    labs(y = rlang::as_string(y_left),
         color = "Series")
}

Example output

I think this output, while different than other answers, preserves the nature of the data and ranges for both primary and secondary variables and their axes.

df %>%
  dual_plot(date, y_left, y_right)

enter image description here

More detail on SO here.

Comments welcomed.

like image 91
ravic_ Avatar answered Oct 02 '22 16:10

ravic_