Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Estimating many interaction terms in glmnet

Tags:

r

formula

glmnet

It is my understanding that glmnet takes matrices where each column is an explanatory variable.

I have a dataframe with ~10 explanatory variables (some of which are factors)

How could I take a formula such as y~(x1*x2*x3)+(x4*x5)+x6 and estimate this using glmnet?

I believe I would have to create a matrix where each interaction term has its own column, but I have no idea how simply take the inputs the formula and variables(some of which are factors) and get the output a matrix I can easily put into glmnet.

like image 497
Will Beauchamp Avatar asked Apr 30 '13 12:04

Will Beauchamp


1 Answers

Assume you want a model of the form y = b0 + b1*x1*x2 + b2*x3 + noise, where the target variable y and all the explanatory variables x1, x2, x3 are stored in the same dataframe. ...

Edit: Thanks @BenBolker for the hint to model.matrix.

Using model.matrix the following code provides a solution:

library(glmnet)

# the original data frame and formula
set.seed(23)
dat <- data.frame(y=runif(5), x1=runif(5), x2=runif(5), x3=runif(5))
f <- as.formula(y~x1:x2+x3+0)
# no intercept here ('+0') because glmnet adds intercept by default

# transform dataframe to matrices as required by glmnet
x <- model.matrix(f, dat)
y <- as.matrix(dat$y, ncol=1)

# fit glmnet model with penalty parameter 0.001
g <- glmnet(x, y, lambda=0.001)
print(coef(g))
#   3 x 1 sparse Matrix of class "dgCMatrix"
#                   s0
# (Intercept) 0.3506450
# x3          0.2308045
# x1:x2       0.1016138

Only for completeness, here is my original answer without using model.matrix, which requires a bit of manual intervention:

library(glmnet)

# the original data frame
set.seed(23)
dat <- data.frame(y=runif(5), x1=runif(5), x2=runif(5), x3=runif(5))

# transform dataframe to matrices as required by glmnet
x <- with(dat, as.matrix(cbind("x1*x2"=x1*x2, "x3"=x3)))
y <- with(dat, as.matrix(y, ncol=1))

# fit glmnet model with penalty parameter 0.001
g <- glmnet(x, y, lambda=0.001)
print(coef(g))
#   3 x 1 sparse Matrix of class "dgCMatrix"
#                   s0
# (Intercept) 0.3506450
# x1*x2       0.1016137
# x3          0.2308045
like image 165
sieste Avatar answered Oct 17 '22 22:10

sieste