I would like to create a function which can run a regression model (e.g. using lm) over different variables in a given dataset. In this function, I would specify as arguments the dataset I'm using, the dependent variable y and the independent variable x. I want this to be a function and not a loop as I would like to call the code in various places of my script. My naive function would look something like this:
lmfun <- function(data, y, x) {
lm(y ~ x, data = data)
}
This function obviously does not work because the lm function does not recognize y and x as variables of the dataset.
I have done some research and stumbled upon the following helpful vignette: programming with dplyr. The vignette gives the following solution to a similar problem as the one I am facing:
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
my_sum <- function(df, group_var) {
group_var <- enquo(group_var)
df %>%
group_by(!! group_var) %>%
summarise(a = mean(a))
}
I am aware that lm is not a function that is part of the dplyr package but would like to come up with a solution similar as this. I've tried the following:
lmfun <- function(data, y, x) {
y <- enquo(y)
x <- enquo(x)
lm(!! y ~ !! x, data = data)
}
lmfun(mtcars, mpg, disp)
Running this code gives the following error message:
Error in is_quosure(e2) : argument "e2" is missing, with no default
Anyone has an idea on how to amend the code to make this work?
Thanks,
Joost.
The lm() function is used to fit linear models to data frames in the R Language. It can be used to carry out regression, single stratum analysis of variance, and analysis of covariance to predict the value corresponding to data that is not in the data frame.
Summary: R linear regression uses the lm() function to create a regression model given some formula, in the form of Y~X+X2. To look at the model, you use the summary() function. To analyze the residuals, you pull out the $resid variable from your new model.
lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm") . The functions summary and anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coefficients , effects , fitted.
Linear regression in R is a method used to predict the value of a variable using the value(s) of one or more input predictor variables. The goal of linear regression is to establish a linear relationship between the desired output variable and the input predictors.
You can fix this problem by using the quo_name
's and formula
:
lmfun <- function(data, y, x) {
y <- enquo(y)
x <- enquo(x)
model_formula <- formula(paste0(quo_name(y), "~", quo_name(x)))
lm(model_formula, data = data)
}
lmfun(mtcars, mpg, disp)
# Call:
# lm(formula = model_formula, data = data)
#
# Coefficients:
# (Intercept) disp
# 29.59985 -0.04122
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With