Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function which runs lm over different variables

Tags:

r

lm

quosure

I would like to create a function which can run a regression model (e.g. using lm) over different variables in a given dataset. In this function, I would specify as arguments the dataset I'm using, the dependent variable y and the independent variable x. I want this to be a function and not a loop as I would like to call the code in various places of my script. My naive function would look something like this:

lmfun <- function(data, y, x) {
  lm(y ~ x, data = data)
}

This function obviously does not work because the lm function does not recognize y and x as variables of the dataset.

I have done some research and stumbled upon the following helpful vignette: programming with dplyr. The vignette gives the following solution to a similar problem as the one I am facing:

df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5),
  b = sample(5)
)

my_sum <- function(df, group_var) {
  group_var <- enquo(group_var)
  df %>%
    group_by(!! group_var) %>%
    summarise(a = mean(a))
}

I am aware that lm is not a function that is part of the dplyr package but would like to come up with a solution similar as this. I've tried the following:

lmfun <- function(data, y, x) {
  y <- enquo(y)
  x <- enquo(x)

  lm(!! y ~ !! x, data = data)
}

lmfun(mtcars, mpg, disp)

Running this code gives the following error message:

Error in is_quosure(e2) : argument "e2" is missing, with no default

Anyone has an idea on how to amend the code to make this work?

Thanks,

Joost.

like image 933
Joost Avatar asked Jan 06 '19 11:01

Joost


People also ask

What is lm () function?

The lm() function is used to fit linear models to data frames in the R Language. It can be used to carry out regression, single stratum analysis of variance, and analysis of covariance to predict the value corresponding to data that is not in the data frame.

What does lm Linearregression () do?

Summary: R linear regression uses the lm() function to create a regression model given some formula, in the form of Y~X+X2. To look at the model, you use the summary() function. To analyze the residuals, you pull out the $resid variable from your new model.

What does lm function return in R?

lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm") . The functions summary and anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coefficients , effects , fitted.

What type of regression is lm in R?

Linear regression in R is a method used to predict the value of a variable using the value(s) of one or more input predictor variables. The goal of linear regression is to establish a linear relationship between the desired output variable and the input predictors.


Video Answer


1 Answers

You can fix this problem by using the quo_name's and formula:

lmfun <- function(data, y, x) {
  y <- enquo(y)
  x <- enquo(x)

  model_formula <- formula(paste0(quo_name(y), "~", quo_name(x)))
  lm(model_formula, data = data)
}

lmfun(mtcars, mpg, disp)

# Call:
#   lm(formula = model_formula, data = data)
# 
# Coefficients:
#   (Intercept)         disp  
#      29.59985     -0.04122  
like image 96
kath Avatar answered Oct 13 '22 19:10

kath