I need to write a function that would allow me to quickly do a dual axis plot using ggplot2
. I know that dual axis plots are generally deprecated, but still I think it may be useful if you're after observing similar patterns in time series (for all of those who disagree, please treat this question strictly technically). It is actually possible with sec_axis()
function from ggplot2
, but it needs a defined formula. So here's my attempt to calculate this automatically:
dual_plot <- function(data, x, y_left, y_right){
x <- ensym(x)
y_left <- ensym(y_left)
y_right <- ensym(y_right)
ratio_model <- lm(eval(y_left) ~ eval(y_right), data = data)
data %>%
select(!!x, !!y_left, !!y_right) %>%
mutate(!!y_right := predict(ratio_model)) %>%
gather(k, v, -!!x) %>%
ggplot() +
geom_line(aes(!!x, v, colour = k)) +
scale_y_continuous(sec.axis = sec_axis(~ . / ratio_model$coefficients[[2]] -
ratio_model$coefficients[[1]],
name = rlang::as_string(y_right))) +
labs(y = rlang::as_string(y_left))
}
However, lm
may fit a negative direction coefficient which reverse the trend and is really misleading. So I need another approach to calculating this formula - either using linear regression with coefficient constrain or a clever way of fitting a formula. How can it be done in R? Or what are the alternatives to sec_axis
that would allow to draw dual axis plot automatically?
@Edit: One example would be:
df <- structure(list(date = structure(c(17167, 17168, 17169, 17170,
17171, 17172, 17173, 17174, 17175, 17176, 17177, 17178, 17179,
17180, 17181), class = "Date"), y_right = c(-107073.90734625,
-633197.630546488, -474626.43291613, -306006.801458608, 56062.072352192,
522580.236751187, 942796.389093215, -101845.73678439, -632658.677118481,
-479257.088784885, -303439.231633988, 50273.2477880417, 521669.062954895,
948127.92455586, -107073.90734625), y_left = c(1648808.16, 3152543.07,
2702739.91, 2382616.25, 1606089.88, 1592465.75, 1537283.99, 2507221.61,
3049076.19, 3125424.4, 2774215.1, 2356412.98, 1856506.41, 1477195.08,
2485713.2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-15L))
df %>%
dual_plot(date, y_left, y_right)
The calculated ratio model has direction coefficient of -1.02
, so the y_right
is reversed (where the function is decreasing, the plotted function is increasing and the other way around) and hence misleading.
The ggplot capability to allow secondary axes (from version 2.2 on) is mostly a labelling benefit. You still have to project your secondary data onto the proper range. I think the easiest and safest way to accomplish that is a min-max transformation, using the ranges to:
Note there are many ways for this to be misleading in its own way, including the fact that it will use the full range for the secondary variable even if it definitely shouldn't. Take care.
df %>%
select(date, y_left, y_right) %>%
mutate(y_right = scales::rescale(y_right, to=range(df$y_left))) %>%
gather(key, value, -date) %>%
ggplot() +
geom_line(aes(x = date, y = value, color = key)) +
scale_y_continuous(sec.axis = sec_axis(~ scales::rescale(., to=range(df$y_right)),
name = "Right side")) +
labs(y = "Left side",
color = "Series")
I've tried to conserve your code, and focus on the use of scales::rescale
to project from one range to another.
library(scales)
library(tidyverse)
dual_plot <- function(data, x, y_left, y_right) {
x <- ensym(x)
y_left <- ensym(y_left)
y_right <- ensym(y_right)
# Introducing ranges
left_range <- range(data %>% pull(!!y_left))
right_range <- range(data %>% pull(!!y_right))
data %>%
select(!!x, !!y_left, !!y_right) %>%
# Transform
mutate(!!y_right := scales::rescale(!!y_right, to=left_range)) %>%
gather(k, v, -!!x) %>%
ggplot() +
geom_line(aes(!!x, v, colour = k)) +
# Change secondary axis scaling and label
scale_y_continuous(sec.axis = sec_axis(~ scales::rescale(., to=right_range),
name = rlang::as_string(y_right))) +
labs(y = rlang::as_string(y_left),
color = "Series")
}
I think this output, while different than other answers, preserves the nature of the data and ranges for both primary and secondary variables and their axes.
df %>%
dual_plot(date, y_left, y_right)
More detail on SO here.
Comments welcomed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With