Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linear Regression in R with variable number of explanatory variables [duplicate]

Possible Duplicate:
Specifying formula in R with glm without explicit declaration of each covariate
how to succinctly write a formula with many variables from a data frame?

I have a vector of Y values and a matrix of X values that I want to perform a multiple regression on (i.e. Y = X[column 1] + X[column 2] + ... X[column N])

The problem is that the number of columns in my matrix (N) is not prespecified. I know in R, to perform a linear regression you have to specify the equation:

fit = lm(Y~X[,1]+X[,2]+X[,3])

But how do I do this if I don't know how many columns are in my X matrix?

Thanks!

like image 571
Michael Avatar asked Dec 07 '22 18:12

Michael


1 Answers

Three ways, in increasing level of flexibility.

Method 1

Run your regression using the formula notation:

fit <- lm( Y ~ . , data=dat )

Method 2

Put all your data in one data.frame, not two:

dat <- cbind(data.frame(Y=Y),as.data.frame(X))

Then run your regression using the formula notation:

fit <- lm( Y~. , data=dat )

Method 3

Another way is to build the formula yourself:

model1.form.text <- paste("Y ~",paste(xvars,collapse=" + "),collapse=" ")
model1.form <- as.formula( model1.form.text )
model1 <- lm( model1.form, data=dat )

In this example, xvars is a character vector containing the names of the variables you want to use.

like image 128
Ari B. Friedman Avatar answered Dec 09 '22 08:12

Ari B. Friedman