Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Displaying geom_smooth() trend line from a specified x value

Tags:

r

ggplot2

Suppose a dataset containing count data per multiple time periods and per multiple groups in the following format:

set.seed(123)
df <- data.frame(group = as.factor(rep(1:3, each = 50)),
                 week = rep(1:50, 3),
                 rate = c(round(700 - rnorm(50, 100, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0),
                          round(1000 - rnorm(50, 200, 10) - 1:50 * 2, 0)))

    group week rate
1       1    1  604
2       1    2  598
3       1    3  578
4       1    4  591
5       1    5  589
6       1    6  571
7       1    7  581
8       1    8  597
9       1    9  589
10      1   10  584

I'm interested in fitting a model-based trend line per groups, however, I want this trend line to be displayed only from a certain x value. To visualize the trend line using all data points (requires ggplot2):

df %>%
 ggplot(aes(x = week,
            y = rate,
            group = group,
            lty = group)) + 
 geom_line() +
 geom_point() +
 geom_smooth(method = "glm", 
             method.args = list(family = "quasipoisson"),
             se = FALSE) 

Plot 1

Or to fit a model based on a specific range of values (requires ggplot2 and dplyr):

df %>%
 group_by(group) %>%
 mutate(rate2 = ifelse(week < 35, NA, rate)) %>%
 ggplot(aes(x = week,
            y = rate,
            group = group,
            lty = group)) + 
 geom_line() +
 geom_point() +
 geom_smooth(aes(y = rate2),
             method = "glm", 
             method.args = list(family = "quasipoisson"),
             se = FALSE)

Plot 2

However, I cannot find a way to fit the models using all data, but display the trend line only from a specific x value (let's say 35+). Thus, I essentially want the trend line as computed for plot one, but displaying it according the second plot, using ggplot2 and ideally only one pipeline.

like image 278
tmfmnk Avatar asked Feb 07 '21 18:02

tmfmnk


People also ask

What does Geom_smooth () using formula YX mean?

The warning geom_smooth() using formula 'y ~ x' is not an error. Since you did not supply a formula for the fit, geom_smooth assumed y ~ x, which is just a linear relationship between x and y. You can avoid this warning by using geom_smooth(formula = y ~ x, method = "lm")

What does Geom_smooth () function do in R?

The geom smooth function is a function for the ggplot2 visualization package in R. Essentially, geom_smooth() adds a trend line over an existing plot. What is this? By default, the trend line that's added is a LOESS smooth line.

What does the SE argument to Geom_smooth ()` do?

se Display confidence interval around smooth (TRUE by default, see level to control.)


1 Answers

I went to look at the after_stat function mentioned by @tjebo. See if the following works for you?

df %>%
  ggplot(aes(x = week,
             y = rate,
             lty = group)) + 
  geom_line() +
  geom_point() +
  geom_smooth(method = "glm", 
              aes(group = after_stat(interaction(group, x > 35)),
                  colour = after_scale(alpha(colour, as.numeric(x > 35)))),
              method.args = list(family = "quasipoisson"),
              se = F)

result

This works by splitting the points associated with each line into two groups, those in the x <=35 region and those in the x >35 region, since a line's colour shouldn't vary, and defining a separate colour transparency for each new group. As a result, only the lines in the x > 35 region are visible.

When used, the code triggers a warning that the after_scale modification isn't applied to the legend. I don't think that's a problem though, since we don't need it to appear in the legend anyway.

like image 125
Z.Lin Avatar answered Oct 06 '22 08:10

Z.Lin