Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpreting coefficient names in glmnet in R

I am using glmnet to predict probabilities based on a set of 5 features using the following code. I need the actual formula because I need to use it in a different (non R) program.

deg = 3

glmnet.fit <- cv.glmnet(poly(train.matrix,degree=deg),train.result,alpha=0.05,family='binomial')

The names of the resulting coefficients have five positions (I assume this is one of each feature) and each one of them is a number between 0 and 3 (I assume this is the degree of the polynomial). But I am still confused about how exactly to reconstruct the formula.

Take these for example:

> coef(glmnet.fit,s= best.lambda)  
(Intercept) -2.25e-01  
...
0.1.0.0.1    3.72e+02
1.1.0.0.1    9.22e+04
0.2.0.0.1    6.17e+02
...

Let's call the features A,B,C,D,E. Is this how the formula should be interpreted?

Y =
-2.25e-01 +
...
(3.72e+02 * (B * E) +
(9.22e+04 * (A * B * E) +
(6.17e+02 * (B^2 + E)
...

If that is not correct how should I interpret it?

I saw the following question and answer but it didn't address these types of coefficient names.

Thanks in advance for your help.

like image 677
dougp Avatar asked Jun 21 '12 15:06

dougp


1 Answers

Usually, we use the predict function. In your case, you need the coefficients to use in another program. We can check the agreement between using predict and the result of multiplying the data by the coefficients.

# example data

library(ElemStatLearn) 
library(glmnet) 
data(prostate) 

# training data 

data.train <- prostate[prostate$train,] 
y <- data.train$lpsa 

# isolate predictors

data.train <- as.matrix(data.train[,-c(9,10)]) 

# test data

data.test <- prostate[!prostate$train,] 
data.test <-  as.matrix(data.test[,-c(9,10)]) 

# fit training model 

myglmnet =cv.glmnet(data.train,y) 

# predictions by using predict function 

yhat_enet <- predict(myglmnet,newx=data.test, s="lambda.min") 

#  get predictions by using coefficients 

beta  <- as.vector( t(coef(myglmnet,s="lambda.min"))) 

# Coefficients are returned on the scale of the original data. 
# note we need to add column  of 1s for intercept

testX <- cbind(1,data.test) 
yhat2  <- testX %*% beta 

# check by plotting predictions  

plot(yhat2,yhat_enet)

So each coefficient corresponds to a column in your training data. The first one corresponds to the intercept. In sum, you can extract the coefficients and multiply by the test data to obtain the outcomes you are interested in.

like image 151
julieth Avatar answered Sep 22 '22 00:09

julieth