Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to obtain feature importance by class using ranger?

I have been using ranger and randomForest functions in R. I am particularly interested in getting the importance of features (predictors) for each class that I am trying to predict, rather than the overall importance for all classes together. I know how to do this using the importance() function from randomForest in which it seems to be the default behaviour:

library(randomForest)
set.seed(100)
rfmodel <- randomForest(Species ~ ., data = iris, ntree = 1000, importance = TRUE)
importance(rfmodel)

This results in a matrix with the importance of each feature for each of the three classes

Alternatively for ranger I am running:

library(ranger)
rangermodel<-ranger(Species ~ ., data = iris, num.trees = 1000, write.forest=TRUE, importance="permutation", local.importance=TRUE)
importance(rangermodel)
rangermodel$variable.importance
rangermodel$variable.importance.local

rangermodel$variable.importance provides the importance of the features for the whole classification problem, but not by class. While rangermodel$variable.importance.local provides the importance for each case, but also not by class.

The ranger documentation does not seem to provide information on this. The only question I could find on the topic is this one: How can I separate the overall variable importance values when using Random forest? But they did not reach a conclusion on how to achieve this with ranger. Changing the ranger code as below did not provide the output I am looking for either:

rangermodel<-ranger(Species ~ ., data = iris, num.trees = 1000, write.forest=TRUE, importance="impurity")
like image 546
Felipe Hernandes Coutinho Avatar asked Oct 21 '25 17:10

Felipe Hernandes Coutinho


1 Answers

The idea is to use local variable importance, defined as below:

For each case, consider all the trees for which it is oob. Subtract the percentage of votes for the correct class in the variable-m-permuted oob data from the percentage of votes for the correct class in the untouched oob data. This is the local importance score for variable m for this case. Source: Breiman's and Cutler website, section: Variable Importance

Extracting local variable importance from ranger: you need to specify both importance = "permutation" and local.importance = TRUE

library(ranger)
rf.iris <- ranger(Species ~ ., iris, importance = "permutation", 
             local.importance = TRUE)
rf.iris$variable.importance.local

Then you can

library(data.table)    
as.data.table(rf.iris$variable.importance.local)[,Species := iris$Species][,lapply(.SD,mean),by=Species]

Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1:     setosa      0.01316     0.00252      0.11192     0.12548
2: versicolor      0.00800     0.00120      0.10672     0.11112
3:  virginica      0.01352     0.00316      0.10632     0.09956

Refs:

like image 180
Cazz Avatar answered Oct 23 '25 05:10

Cazz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!