Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

substitute in r together with anova

Tags:

r

anova

I tried to run anova on different sets of data and didn't quite know how to do it. I goolged and found this to be useful: https://stats.idre.ucla.edu/r/codefragments/looping_strings/

hsb2 <- read.csv("https://stats.idre.ucla.edu/stat/data/hsb2.csv")
names(hsb2)
varlist <- names(hsb2)[8:11]
models <- lapply(varlist, function(x) {
lm(substitute(read ~ i, list(i = as.name(x))), data = hsb2)
})

My understanding of what the above codes does is it creates a function lm() and apply it to each variable in varlist and it does linear regression on each of them.

So I thought use aov instead of lm would work for me like this:

aov(substitute(read ~ i, list(i = as.name(x))), data = hsb2)

However, I got this error:

Error in terms.default(formula, "Error", data = data) : 
no terms component nor attribute

I have no idea of where the error comes from. Please help!

like image 781
olala Avatar asked Sep 23 '14 05:09

olala


3 Answers

This should do it. The varlist vector is going to be passed item by item to the function and the column will be delivered. The lm function will only see a two column dataframe and the "read" column will be the dependent variable each time. No need for fancy substitution:

models <- sapply(varlist, function(x) {
lm(read ~ .,  data = hsb2[, c("read", x) ])
}, simplify=FALSE)

> summary(models[[1]])  # The first model. Note the use of "[["

Call:
lm(formula = read ~ ., data = hsb2[, c("read", x)])

Residuals:
     Min       1Q   Median       3Q      Max 
-19.8565  -5.8976  -0.8565   5.5801  24.2703 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.16215    3.30716   5.492 1.21e-07 ***
write        0.64553    0.06168  10.465  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 8.248 on 198 degrees of freedom
Multiple R-squared: 0.3561, Adjusted R-squared: 0.3529 
F-statistic: 109.5 on 1 and 198 DF,  p-value: < 2.2e-16 

For all the models::

lapply(models, summary)
like image 34
IRTFM Avatar answered Nov 06 '22 08:11

IRTFM


The problem is that substitute() returns an expression, not a formula. I think @thelatemail's suggestion of

lm(as.formula(paste("read ~",x)), data = hsb2)

is a good work around. Alternatively you could evaluate the expression to get the formula with

models <- lapply(varlist, function(x) {
    aov(eval(substitute(read ~ i, list(i = as.name(x)))), data = hsb2)
})

I guess it depends on what you want to do with the list of models afterward. Doing

models <- lapply(varlist, function(x) {
    eval(bquote(aov(read ~ .(as.name(x)), data = hsb2)))
})

gives a "cleaner" call property for each of the result.

like image 196
MrFlick Avatar answered Nov 06 '22 06:11

MrFlick


akrun borrowed my answer the other night, now I'm (partially) borrowing his.

do.call puts the variables into the call output so it reads properly. Here's a general function for simple regression.

doModel <- function(col1, col2, data = hsb2, FUNC = "lm") 
{
    form <- as.formula(paste(col1, "~", col2))
    do.call(FUNC, list(form, substitute(data)))
}     

lapply(varlist, doModel, col1 = "read")
# [[1]]
#
# Call:
# lm(formula = read ~ write, data = hsb2)
#
# Coefficients:
# (Intercept)        write  
#     18.1622       0.6455  
#
#
# [[2]]
#
# Call:
# lm(formula = read ~ math, data = hsb2)
#
# Coefficients:
# (Intercept)         math  
#     14.0725       0.7248  
#
# ...
# ...
# ...

Note: As thelatemail mentions in his comment

sapply(varlist, doModel, col1 = "read", simplify = FALSE)

will keep the names in the list and also allow list$name subsetting.

like image 4
Rich Scriven Avatar answered Nov 06 '22 08:11

Rich Scriven