Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R print equation of linear regression on the plot itself

Tags:

r

regression

How do we print the equation of a line on a plot?

I have 2 independent variables and would like an equation like this:

y=mx1+bx2+c

where x1=cost, x2 =targeting

I can plot the best fit line but how do i print the equation on the plot?

Maybe i cant print the 2 independent variables in one equation but how do i do it for say y=mx1+c at least?

Here is my code:

fit=lm(Signups ~ cost + targeting)
plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
abline(lm(Signups ~ cost))
like image 297
jxn Avatar asked Jun 11 '14 21:06

jxn


People also ask

How do you find the equation of a linear regression in R?

The mathematical formula of the linear regression can be written as y = b0 + b1*x + e , where: b0 and b1 are known as the regression beta coefficients or parameters: b0 is the intercept of the regression line; that is the predicted value when x = 0 . b1 is the slope of the regression line.

How do you write the equation of a regression line?

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).

Which function is used for creating a regression model from given formula LM () predict () Summary ()?

Summary: R linear regression uses the lm() function to create a regression model given some formula, in the form of Y~X+X2. To look at the model, you use the summary() function.


3 Answers

I tried to automate the output a bit:

fit <- lm(mpg ~ cyl + hp, data = mtcars)
summary(fit)
##Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 36.90833    2.19080  16.847  < 2e-16 ***
## cyl         -2.26469    0.57589  -3.933  0.00048 ***
## hp          -0.01912    0.01500  -1.275  0.21253 


plot(mpg ~ cyl, data = mtcars, xlab = "Cylinders", ylab = "Miles per gallon")
abline(coef(fit)[1:2])

## rounded coefficients for better output
cf <- round(coef(fit), 2) 

## sign check to avoid having plus followed by minus for negative coefficients
eq <- paste0("mpg = ", cf[1],
             ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " cyl ",
             ifelse(sign(cf[3])==1, " + ", " - "), abs(cf[3]), " hp")

## printing of the equation
mtext(eq, 3, line=-2)

enter image description here

Hope it helps,

alex

like image 156
alko989 Avatar answered Oct 05 '22 17:10

alko989


You use ?text. In addition, you should not use abline(lm(Signups ~ cost)), as this is a different model (see my answer on CV here: Is there a difference between 'controling for' and 'ignoring' other variables in multiple regression). At any rate, consider:

set.seed(1)
Signups   <- rnorm(20)
cost      <- rnorm(20)
targeting <- rnorm(20)
fit       <- lm(Signups ~ cost + targeting)

summary(fit)
# ...
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)   0.1494     0.2072   0.721    0.481
# cost         -0.1516     0.2504  -0.605    0.553
# targeting     0.2894     0.2695   1.074    0.298
# ...

windows();{
  plot(cost, Signups, xlab="cost", ylab="Signups", main="Signups")
  abline(coef(fit)[1:2])
  text(-2, -2, adj=c(0,0), labels="Signups = .15 -.15cost + .29targeting")
}

enter image description here

like image 27
gung - Reinstate Monica Avatar answered Oct 05 '22 19:10

gung - Reinstate Monica


Here's a solution using tidyverse packages.

The key is the broom package, whcih simplifies the process of extracting model data. For example:

fit1 <- lm(mpg ~ cyl, data = mtcars)
summary(fit1)

fit1 %>%
    tidy() %>%
    select(estimate, term)

Result

# A tibble: 2 x 2
  estimate term       
     <dbl> <chr>      
1    37.9  (Intercept)
2    -2.88 cyl 

I wrote a function to extract and format the information using dplyr:

get_formula <- function(object) {
    object %>% 
        tidy() %>% 
        mutate(
            term = if_else(term == "(Intercept)", "", term),
            sign = case_when(
                term == "" ~ "",
                estimate < 0 ~ "-",
                estimate >= 0 ~ "+"
            ),
            estimate = as.character(round(abs(estimate), digits = 2)),
            term = if_else(term == "", paste(sign, estimate), paste(sign, estimate, term))
        ) %>%
        summarize(terms = paste(term, collapse = " ")) %>%
        pull(terms)
}

get_formula(fit1)

Result

[1] " 37.88 - 2.88 cyl"

Then use ggplot2 to plot the line and add a caption

mtcars %>%
    ggplot(mapping = aes(x = cyl, y = mpg)) +
    geom_point() +
    geom_smooth(formula = y ~ x, method = "lm", se = FALSE) +
    labs(
        x = "Cylinders", y = "Miles per Gallon", 
        caption = paste("mpg =", get_formula(fit1))
    )

Plot using geom_smooth()

This approach of plotting a line really only makes sense to visualize the relationship between two variables. As @Glen_b pointed out in the comment, the slope we get from modelling mpg as a function of cyl (-2.88) doesn't match the slope we get from modelling mpg as a function of cyl and other variables (-1.29). For example:

fit2 <- lm(mpg ~ cyl + disp + wt + hp, data = mtcars)
summary(fit2)

fit2 %>%
    tidy() %>%
    select(estimate, term)

Result

# A tibble: 5 x 2
  estimate term       
     <dbl> <chr>      
1  40.8    (Intercept)
2  -1.29   cyl        
3   0.0116 disp       
4  -3.85   wt         
5  -0.0205 hp 

That said, if you want to accurately plot the regression line for a model that includes variables that don't appear included in the plot, use geom_abline() instead and get the slope and intercept using broom package functions. As far as I know geom_smooth() formulas can't reference variables that aren't already mapped as aesthetics.

mtcars %>%
    ggplot(mapping = aes(x = cyl, y = mpg)) +
    geom_point() +
    geom_abline(
        slope = fit2 %>% tidy() %>% filter(term == "cyl") %>% pull(estimate),
        intercept = fit2 %>% tidy() %>% filter(term == "(Intercept)") %>% pull(estimate),
        color = "blue"
    ) +
    labs(
        x = "Cylinders", y = "Miles per Gallon", 
        caption = paste("mpg =", get_formula(fit2))
    )

Plot using geom_abline()

like image 22
Damian Avatar answered Oct 05 '22 19:10

Damian