I would like to use geom_smooth
to get a fitted line from a certain linear regression model.
It seems to me that the formula can only take x
and y
and not any additional parameter.
To show more clearly what I want:
library(dplyr)
library(ggplot2)
set.seed(35413)
df <- data.frame(pred = runif(100,10,100),
factor = sample(c("A","B"), 100, replace = TRUE)) %>%
mutate(
outcome = 100 + 10*pred +
ifelse(factor=="B", 200, 0) +
ifelse(factor=="B", 4, 0)*pred +
rnorm(100,0,60))
With
ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm") +
theme_bw()
I produce fitted lines that, due to the color=factor
option, are basically the output of the linear model lm(outcome ~ pred*factor, df)
In some cases, however, I prefer the lines to be the output of a different model fit, like lm(outcome ~ pred + factor, df)
, for which I can use something like:
fit <- lm(outcome ~ pred+factor, df)
predval <- expand.grid(
pred = seq(
min(df$pred), max(df$pred), length.out = 1000),
factor = unique(df$factor)) %>%
mutate(outcome = predict(fit, newdata = .))
ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point() +
geom_line(data = predval) +
theme_bw()
which results in :
My question: is there a way to produce the latter graph exploiting the geom_smooth
instead? I know there is a formula =
- option in geom_smooth
but I can't make something like formula = y ~ x + factor
or formula = y ~ x + color
(as I defined color = factor
) work.
We have our scatterplot, and we're adding a trend line as a new layer with ' + ' and geom_smooth() . But in this case, we're adding a straight-line linear model instead of a LOESS line. To do this, we simply set method = 'lm' . (If you haven't figured it out, ' lm ' means "linear model.")
Geom_line creates a single line for both panels and distributes the colors according to the colour variable, while geom_smooth does not draw the smooth line in the 2nd panel.
geom_smooth() and stat_smooth() are effectively aliases: they both use the same arguments. Use stat_smooth() if you want to display the results with a non-standard geom.
Save this question. Show activity on this post. I hope this question isn't a duplicate.
This is a very interesting question. Probably the main reason why geom_smooth
is so "resistant" to allowing custom models of multiple variables is that it is limited to producing 2-D curves; consequently, its arguments are designed for handling two-dimensional data (i.e. formula = response variable ~ independent variable).
The trick to getting what you requested is using the mapping
argument within geom_smooth
, instead of formula
. As you've probably seen from looking at the documentation, formula
only allows you to specify the mathematical structure of the model (e.g. linear, quadratic, etc.). Conversely, the mapping
argument allows you to directly specify new y-values - such as the output of a custom linear model that you can call using predict()
.
Note that, by default, inherit.aes
is set to TRUE
, so your plotted regressions will be coloured appropriately by your categorical variable. Here's the code:
# original plot
plot1 <- ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm") +
ggtitle("outcome ~ pred") +
theme_bw()
# declare new model here
plm <- lm(formula = outcome ~ pred + factor, data=df)
# plot with lm for outcome ~ pred + factor
plot2 <-ggplot(df, aes(x=pred, y=outcome, color=factor)) +
geom_point(aes(color=factor)) +
geom_smooth(method = "lm", mapping=aes(y=predict(plm,df))) +
ggtitle("outcome ~ pred + factor") +
theme_bw()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With