Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GBM R function: get variable importance separately for each class

I am using the gbm function in R (gbm package) to fit stochastic gradient boosting models for multiclass classification. I am simply trying to obtain the importance of each predictor separately for each class, like in this picture from the Hastie book (the Elements of Statistical Learning) (p. 382).

enter image description here

However, the function summary.gbm only returns the overall importance of the predictors (their importance averaged over all classes).

Does anyone know how to get the relative importance values?

like image 264
Antoine Avatar asked Apr 14 '15 20:04

Antoine


People also ask

What does varImp do in R?

The varImp function tracks the changes in model statistics, such as the GCV, for each predictor and accumulates the reduction in the statistic when each predictor's feature is added to the model. This total reduction is used as the variable importance measure.

How is variable importance calculated in Caret?

Partial Least Squares: the variable importance measure here is based on weighted sums of the absolute regression coefficients. The weights are a function of the reduction of the sums of squares across the number of PLS components and are computed separately for each outcome.

How do you find the importance of variables?

How Is Variable Importance Calculated? Variable importance is calculated by the sum of the decrease in error when split by a variable. Then, the relative importance is the variable importance divided by the highest variable importance value so that values are bounded between 0 and 1.

What is relative influence in GBM?

Applying the summary function to a gbm output produces both a Variable Importance Table and a Plot of the model. This table below ranks the individual variables based on their relative influence, which is a measure indicating the relative importance of each variable in training the model.


1 Answers

I think the short answer is that on page 379, Hastie mentions that he uses MART, which appears to only be available for Splus.

I agree that the gbm package doesn't seem to allow for seeing the separate relative influence. If that's something you're interested in for a mutliclass problem, you could probably get something pretty similar by building a one-vs-all gbm for each of your classes and then getting the importance measures from each of those models.

So say your classes are a, b, c, & d. You model a vs. the rest and get the importance from that model. Then you model b vs. the rest and get the importance from that model. Etc.

like image 72
Tchotchke Avatar answered Oct 18 '22 22:10

Tchotchke