Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regress each column in a data frame on a vector in R

Tags:

r

statistics

I want to regress each column in a data set on a vector then return the column which has the highest R-squared value. e.g. I have a vector HAPPY <- (3,2,2,3,1,3,1,3) and I have a data set.

HEALTH  CONINC  MARITAL SATJOB1 MARITAL2                    HAPPY
3           441 5        1            2                        3
1          1764 5        1            2                        2
2          3087 5        1            2                        2
3          3087 5        1            2                        3
1          3969 2        1            5                        1
1          3969 5        1            2                        3
2          4852 5        1            2                        2
3          5734 3        1            3                        3

Regress "Happy" on each of the columns in the data set on the left, then return the column which has the highest R-squared. Example: lm(Health ~ Happy) if Health had the highest R-squared value, then return Health.

I've tried apply, but can't seem to figure out how to return the regression with the highest R-squared. Any suggestions?

like image 303
bstockton Avatar asked Apr 20 '12 06:04

bstockton


1 Answers

I would break this up into two steps:

1) Determine R-squares for each model

2) Determine which is the highest value

mydf<-data.frame(aa=rpois(8,4),bb=rpois(8,2),cc=rbinom(8,1,.5),
  happy=c(3,2,2,3,1,3,1,3))

myRes<-sapply(mydf[-ncol(mydf)],function(x){
  mylm<-lm(x~mydf$happy)
  theR2<-summary(mylm)$r.squared
  return(theR2)
})

names(myRes[which(myRes==max(myRes))])

This was assuming that happy is in your data.frame.

like image 129
BenBarnes Avatar answered Sep 21 '22 02:09

BenBarnes