I want to regress each column in a data set on a vector then return the column which has the highest R-squared value. e.g. I have a vector HAPPY <- (3,2,2,3,1,3,1,3) and I have a data set.
HEALTH CONINC MARITAL SATJOB1 MARITAL2 HAPPY
3 441 5 1 2 3
1 1764 5 1 2 2
2 3087 5 1 2 2
3 3087 5 1 2 3
1 3969 2 1 5 1
1 3969 5 1 2 3
2 4852 5 1 2 2
3 5734 3 1 3 3
Regress "Happy" on each of the columns in the data set on the left, then return the column which has the highest R-squared. Example: lm(Health ~ Happy) if Health had the highest R-squared value, then return Health.
I've tried apply, but can't seem to figure out how to return the regression with the highest R-squared. Any suggestions?
I would break this up into two steps:
1) Determine R-squares for each model
2) Determine which is the highest value
mydf<-data.frame(aa=rpois(8,4),bb=rpois(8,2),cc=rbinom(8,1,.5),
happy=c(3,2,2,3,1,3,1,3))
myRes<-sapply(mydf[-ncol(mydf)],function(x){
mylm<-lm(x~mydf$happy)
theR2<-summary(mylm)$r.squared
return(theR2)
})
names(myRes[which(myRes==max(myRes))])
This was assuming that happy
is in your data.frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With