Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

glmnet - variable importance?

Tags:

r

glmnet

I´m using the glmnet package to perform a LASSO regression. Is there a way to get the importance of the individual variables that were selected? I thought about ranking the coefficients that were obtained through the coef(...) command (i.e. the greater the distance from zero the more important a variable would be). Would that be a valid approach?

Thanks for your help!

cvfit = cv.glmnet(x, y, family = "binomial")
coef(cvfit, s = "lambda.min")

## 21 x 1 sparse Matrix of class "dgCMatrix"
##                    1
## (Intercept)  0.14936
## V1           1.32975
## V2           .      
## V3           0.69096
## V4           .      
## V5          -0.83123
## V6           0.53670
## V7           0.02005
## V8           0.33194
## V9           .      
## V10          .      
## V11          0.16239
## V12          .      
## V13          .      
## V14         -1.07081
## V15          .      
## V16          .      
## V17          .      
## V18          .      
## V19          .      
## V20         -1.04341
like image 399
user86533 Avatar asked Feb 17 '16 16:02

user86533


2 Answers

This is how it is done in caret package.

To summarize, you can take the absolute value of the final coefficients and rank them. The ranked coefficients are your variable importance.

To view the source code, you can type

caret::getModelInfo("glmnet")$glmnet$varImp

If you don't want to use caret package, you can run the following lines from the package, and it should work.

varImp <- function(object, lambda = NULL, ...) {

  ## skipping a few lines

  beta <- predict(object, s = lambda, type = "coef")
  if(is.list(beta)) {
    out <- do.call("cbind", lapply(beta, function(x) x[,1]))
    out <- as.data.frame(out, stringsAsFactors = TRUE)
  } else out <- data.frame(Overall = beta[,1])
  out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
  out
}

Finally, call the function with your fit.

varImp(cvfit, lambda = cvfit$lambda.min)
like image 129
Boxuan Avatar answered Oct 18 '22 20:10

Boxuan


Before you compare the magnitudes of the coefficients you should normalize them by multiplying each coefficent by the standard deviation of the corresponding predictor. This answer has more detail and useful links: https://stats.stackexchange.com/a/211396/34615

like image 24
Kent Johnson Avatar answered Oct 18 '22 21:10

Kent Johnson