Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vector of variable names in R

I'd like to create a function that automatically generates uni and multivariate regression analyses, but I'm not able to figure out how I can specify **variables in vectors...**This seems very easy, but skimming the documentation I havent figured it out so far...

Easy example

a<-rnorm(100)
b<-rnorm(100)
k<-c("a","b")
d<-c(a,b)
summary(k[1])

But k[1]="a" and is a character vector...d is just b appended to a, not the variable names. In effect I'd like k[1] to represent the vector a.

Appreciate any answers...

//M

like image 736
Misha Avatar asked Aug 27 '10 15:08

Misha


3 Answers

you could use a list k=list(a,b). This creates a list with components a and b but is not a list of variable names.

like image 41
Andrew Redd Avatar answered Sep 25 '22 02:09

Andrew Redd


get() is what you're looking for :

summary(get(k[1]))

edit : get() is not what you're looking for, it's list(). get() could be useful too though.

If you're looking for automatic generation of regression analyses, you might actually benefit from using eval(), although every R-programmer will warn you about using eval() unless you know very well what you're doing. Please read the help files about eval() and parse() very carefully before you use them.

An example :

d <- data.frame(
  var1 = rnorm(1000),
  var2 = rpois(1000,4),
  var3 = sample(letters[1:3],1000,replace=T)
)

vars <- names(d)

auto.lm <- function(d,dep,indep){
      expr <- paste(
          "out <- lm(",
          dep,
          "~",
          paste(indep,collapse="*"),
          ",data=d)"
      )
      eval(parse(text=expr))
      return(out)
}

auto.lm(d,vars[1],vars[2:3])
like image 33
Joris Meys Avatar answered Sep 24 '22 02:09

Joris Meys


You can use the "get" function to get an object based on a character string of its name, but in the long run it is better to store the variables in a list and just access them that way, things become much simpler, you can grab subsets, you can use lapply or sapply to run the same code on every element. When saving or deleting you can just work on the entire list rather than trying to remember every element. e.g.:

mylist <- list(a=rnorm(100), b=rnorm(100) )
names(mylist)
summary(mylist[[1]])
# or
summary(mylist[['a']])
# or
summary(mylist$a)
# or 
d <- 'a'
summary(mylist[[d]])

# or
lapply( mylist, summary )

If you are programatically creating models for analysis with lm (or other modeling functions), then one approach is to just subset your data and use the ".", e.g.:

yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
fit <- lm( Sepal.Width ~ ., data=iris[, c(yvar,xvars)] )

Or you can build the formula using "paste" or "sprintf" then use "as.formula" to convert it to a formula, e.g.:

yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
my.formula <- paste( yvar, '~', paste( xvars, collapse=' + ' ) )
my.formula <- as.formula(my.formula)
fit <- lm( my.formula, data=iris )

Note also the problem of multiple comparisons if you are looking at many different models fit automatically.

like image 195
Greg Snow Avatar answered Sep 23 '22 02:09

Greg Snow