I tried to run anova on different sets of data and didn't quite know how to do it. I goolged and found this to be useful: https://stats.idre.ucla.edu/r/codefragments/looping_strings/
hsb2 <- read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})
My understanding of what the above codes does is it creates a function lm() and apply it to each variable in varlist and it does linear regression on each of them.
So I thought use aov instead of lm would work for me like this:
aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
However, I got this error:
Error in terms.default(formula, "Error", data = data) :
no terms component nor attribute
I have no idea of where the error comes from. Please help!
This should do it. The varlist vector is going to be passed item by item to the function and the column will be delivered. The lm function will only see a two column dataframe and the "read" column will be the dependent variable each time. No need for fancy substitution:
models <- sapply(varlist, function(x) {
lm(read ~ ., data = hsb2[, c("read", x) ])
}, simplify=FALSE)
> summary(models[[1]]) # The first model. Note the use of "[["
Call:
lm(formula = read ~ ., data = hsb2[, c("read", x)])
Residuals:
Min 1Q Median 3Q Max
-19.8565 -5.8976 -0.8565 5.5801 24.2703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.16215 3.30716 5.492 1.21e-07 ***
write 0.64553 0.06168 10.465 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.248 on 198 degrees of freedom
Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529
F-statistic: 109.5 on 1 and 198 DF, p-value: < 2.2e-16
For all the models::
lapply(models, summary)
The problem is that substitute()
returns an expression, not a formula. I think @thelatemail's suggestion of
lm(as.formula(paste("read ~",x)), data = hsb2)
is a good work around. Alternatively you could evaluate the expression to get the formula with
models <- lapply(varlist, function(x) {
aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2)
})
I guess it depends on what you want to do with the list of models afterward. Doing
models <- lapply(varlist, function(x) {
eval(bquote(aov(read ~ .(as.name(x)), data = hsb2)))
})
gives a "cleaner" call
property for each of the result.
akrun borrowed my answer the other night, now I'm (partially) borrowing his.
do.call
puts the variables into the call
output so it reads properly. Here's a general function for simple regression.
doModel <- function(col1, col2, data = hsb2, FUNC = "lm")
{
form <- as.formula(paste(col1, "~", col2))
do.call(FUNC, list(form, substitute(data)))
}
lapply(varlist, doModel, col1 = "read")
# [[1]]
#
# Call:
# lm(formula = read ~ write, data = hsb2)
#
# Coefficients:
# (Intercept) write
# 18.1622 0.6455
#
#
# [[2]]
#
# Call:
# lm(formula = read ~ math, data = hsb2)
#
# Coefficients:
# (Intercept) math
# 14.0725 0.7248
#
# ...
# ...
# ...
Note: As thelatemail mentions in his comment
sapply(varlist, doModel, col1 = "read", simplify = FALSE)
will keep the names in the list and also allow list$name
subsetting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With