I'd like to create a function that automatically generates uni and multivariate regression analyses, but I'm not able to figure out how I can specify **variables in vectors...**This seems very easy, but skimming the documentation I havent figured it out so far...
Easy example
a<-rnorm(100)
b<-rnorm(100)
k<-c("a","b")
d<-c(a,b)
summary(k[1])
But k[1]="a" and is a character vector...d is just b appended to a, not the variable names. In effect I'd like k[1] to represent the vector a.
Appreciate any answers...
//M
you could use a list k=list(a,b)
. This creates a list with components a and b but is not a list of variable names.
get() is what you're looking for :
summary(get(k[1]))
edit : get() is not what you're looking for, it's list(). get() could be useful too though.
If you're looking for automatic generation of regression analyses, you might actually benefit from using eval(), although every R-programmer will warn you about using eval() unless you know very well what you're doing. Please read the help files about eval() and parse() very carefully before you use them.
An example :
d <- data.frame(
var1 = rnorm(1000),
var2 = rpois(1000,4),
var3 = sample(letters[1:3],1000,replace=T)
)
vars <- names(d)
auto.lm <- function(d,dep,indep){
expr <- paste(
"out <- lm(",
dep,
"~",
paste(indep,collapse="*"),
",data=d)"
)
eval(parse(text=expr))
return(out)
}
auto.lm(d,vars[1],vars[2:3])
You can use the "get" function to get an object based on a character string of its name, but in the long run it is better to store the variables in a list and just access them that way, things become much simpler, you can grab subsets, you can use lapply or sapply to run the same code on every element. When saving or deleting you can just work on the entire list rather than trying to remember every element. e.g.:
mylist <- list(a=rnorm(100), b=rnorm(100) )
names(mylist)
summary(mylist[[1]])
# or
summary(mylist[['a']])
# or
summary(mylist$a)
# or
d <- 'a'
summary(mylist[[d]])
# or
lapply( mylist, summary )
If you are programatically creating models for analysis with lm (or other modeling functions), then one approach is to just subset your data and use the ".", e.g.:
yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
fit <- lm( Sepal.Width ~ ., data=iris[, c(yvar,xvars)] )
Or you can build the formula using "paste" or "sprintf" then use "as.formula" to convert it to a formula, e.g.:
yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
my.formula <- paste( yvar, '~', paste( xvars, collapse=' + ' ) )
my.formula <- as.formula(my.formula)
fit <- lm( my.formula, data=iris )
Note also the problem of multiple comparisons if you are looking at many different models fit automatically.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With