Let's say that I have some data and I have created a linear model to fit the data. Then I plot the data using ggplot2 and I want to add the linear model to the plot. As far as I know, this is the standard way of doing it (using the built-in cars
dataset):
library(ggplot2)
fit <- lm(dist ~ speed, data = cars)
summary(fit)
p <- ggplot(cars, aes(speed, dist))
p <- p + geom_point()
p <- p + geom_smooth(method='lm')
p
However, the above violates the DRY principle ('don't repeat yourself'): it involves creating the linear model in the call to lm
and then recreating it in the call to geom_smooth
. This seems inelegant to me, and it also introduces a space for bugs. For example, if I change the model that is created with lm
but forget to change the model that is created with geom_smooth
, then the summary and the plot won't be of the same model.
Is there a way of using ggplot2 to plot an already existing linear model, e.g. by passing the lm
object itself to the geom_smooth function?
The geom_smooth() function in ggplot2 can plot fitted lines from models with a simple structure. Supported model types include models fit with lm() , glm() , nls() , and mgcv::gam() . Fitted lines can vary by groups if a factor variable is mapped to an aesthetic like color or group .
Adding a regression line on a ggplot You can use geom_smooth() with method = "lm" . This will automatically add a regression line for y ~ x to the plot.
To create multiple regression lines in a single plot using ggplot2, we can use geom_jitter function along with geom_smooth function. The geom_smooth function will help us to different regression line with different colors and geom_jitter will differentiate the points.
You may notice that we sometimes reference 'ggplot2' and sometimes 'ggplot'. To clarify, 'ggplot2' is the name of the most recent version of the package. However, any time we call the function itself, it's just called 'ggplot'.
What one needs to do is to create a new data frame with the observations from the old one plus the predicted values from the model, then plot that dataframe using ggplot2.
library(ggplot2)
# create and summarise model
cars.model <- lm(dist ~ speed, data = cars)
summary(cars.model)
# add 'fit', 'lwr', and 'upr' columns to dataframe (generated by predict)
cars.predict <- cbind(cars, predict(cars.model, interval = 'confidence'))
# plot the points (actual observations), regression line, and confidence interval
p <- ggplot(cars.predict, aes(speed,dist))
p <- p + geom_point()
p <- p + geom_line(aes(speed, fit))
p <- p + geom_ribbon(aes(ymin=lwr,ymax=upr), alpha=0.3)
p
The great advantage of doing this is that if one changes the model (e.g. cars.model <- lm(dist ~ poly(speed, 2), data = cars)
) then the plot and the summary will both change.
Thanks to Plamen Petrov for making me realise what was needed here. As he points out, this approach will only work if predict
is defined for the model in question; if not, one has to define it oneself.
I believe you want to do something along the lines of :
library(ggplot2)
# install.packages('dplyr')
library(dplyr)
fit <- lm(dist ~ speed, data = cars)
cars %>%
mutate( my_model = predict(fit) ) %>%
ggplot() +
geom_point( aes(speed, dist) ) +
geom_line( aes(speed, my_model) )
This will also work for more complex models as long as the corresponding predict method is defined. Otherwise you will need to define it yourself.
In the case of linear model you can add the confidence/prediction bands with slightly more work and reproduce your plot.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With