Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 geom_smooth, extended model for method=lm

Tags:

I would like to use geom_smooth to get a fitted line from a certain linear regression model.

It seems to me that the formula can only take x and y and not any additional parameter.

To show more clearly what I want:

library(dplyr)
library(ggplot2)
set.seed(35413)
df <- data.frame(pred = runif(100,10,100),
           factor = sample(c("A","B"), 100, replace = TRUE)) %>%
  mutate(
    outcome = 100 + 10*pred + 
    ifelse(factor=="B", 200, 0) + 
    ifelse(factor=="B", 4, 0)*pred +
    rnorm(100,0,60))

With

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  theme_bw()

I produce fitted lines that, due to the color=factor option, are basically the output of the linear model lm(outcome ~ pred*factor, df)

enter image description here

In some cases, however, I prefer the lines to be the output of a different model fit, like lm(outcome ~ pred + factor, df), for which I can use something like:

fit <- lm(outcome ~ pred+factor, df)
predval <- expand.grid(
  pred = seq(
    min(df$pred), max(df$pred), length.out = 1000),
  factor = unique(df$factor)) %>%
  mutate(outcome = predict(fit, newdata = .))

ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point() +
  geom_line(data = predval) +
  theme_bw()

which results in :

enter image description here

My question: is there a way to produce the latter graph exploiting the geom_smooth instead? I know there is a formula = - option in geom_smooth but I can't make something like formula = y ~ x + factor or formula = y ~ x + color (as I defined color = factor) work.

like image 991
Dries Avatar asked Mar 09 '18 10:03

Dries


People also ask

What is method lm in Geom_smooth?

We have our scatterplot, and we're adding a trend line as a new layer with ' + ' and geom_smooth() . But in this case, we're adding a straight-line linear model instead of a LOESS line. To do this, we simply set method = 'lm' . (If you haven't figured it out, ' lm ' means "linear model.")

What is the difference between Geom_line and Geom_smooth?

Geom_line creates a single line for both panels and distributes the colors according to the colour variable, while geom_smooth does not draw the smooth line in the 2nd panel.

What is the difference between Geom_smooth and Stat_smooth?

geom_smooth() and stat_smooth() are effectively aliases: they both use the same arguments. Use stat_smooth() if you want to display the results with a non-standard geom.

Is Geom_smooth line of best fit?

Save this question. Show activity on this post. I hope this question isn't a duplicate.


1 Answers

This is a very interesting question. Probably the main reason why geom_smooth is so "resistant" to allowing custom models of multiple variables is that it is limited to producing 2-D curves; consequently, its arguments are designed for handling two-dimensional data (i.e. formula = response variable ~ independent variable).

The trick to getting what you requested is using the mapping argument within geom_smooth, instead of formula. As you've probably seen from looking at the documentation, formula only allows you to specify the mathematical structure of the model (e.g. linear, quadratic, etc.). Conversely, the mapping argument allows you to directly specify new y-values - such as the output of a custom linear model that you can call using predict().

Note that, by default, inherit.aes is set to TRUE, so your plotted regressions will be coloured appropriately by your categorical variable. Here's the code:

# original plot
plot1 <- ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm") +
  ggtitle("outcome ~ pred") +
  theme_bw()

# declare new model here
plm <- lm(formula = outcome ~ pred + factor, data=df)

# plot with lm for outcome ~ pred + factor
plot2 <-ggplot(df, aes(x=pred, y=outcome, color=factor)) +
  geom_point(aes(color=factor)) +
  geom_smooth(method = "lm", mapping=aes(y=predict(plm,df))) +
  ggtitle("outcome ~ pred + factor") +
  theme_bw()

enter image description here enter image description here

like image 171
Marcus Campbell Avatar answered Oct 26 '22 13:10

Marcus Campbell