I have been using ranger and randomForest functions in R. I am particularly interested in getting the importance of features (predictors) for each class that I am trying to predict, rather than the overall importance for all classes together. I know how to do this using the importance() function from randomForest in which it seems to be the default behaviour:
library(randomForest)
set.seed(100)
rfmodel <- randomForest(Species ~ ., data = iris, ntree = 1000, importance = TRUE)
importance(rfmodel)
This results in a matrix with the importance of each feature for each of the three classes
Alternatively for ranger I am running:
library(ranger)
rangermodel<-ranger(Species ~ ., data = iris, num.trees = 1000, write.forest=TRUE, importance="permutation", local.importance=TRUE)
importance(rangermodel)
rangermodel$variable.importance
rangermodel$variable.importance.local
rangermodel$variable.importance provides the importance of the features for the whole classification problem, but not by class. While rangermodel$variable.importance.local provides the importance for each case, but also not by class.
The ranger documentation does not seem to provide information on this. The only question I could find on the topic is this one: How can I separate the overall variable importance values when using Random forest? But they did not reach a conclusion on how to achieve this with ranger. Changing the ranger code as below did not provide the output I am looking for either:
rangermodel<-ranger(Species ~ ., data = iris, num.trees = 1000, write.forest=TRUE, importance="impurity")
The idea is to use local variable importance, defined as below:
For each case, consider all the trees for which it is oob. Subtract the percentage of votes for the correct class in the variable-m-permuted oob data from the percentage of votes for the correct class in the untouched oob data. This is the local importance score for variable m for this case. Source: Breiman's and Cutler website, section: Variable Importance
Extracting local variable importance from ranger: you need to specify both importance = "permutation" and local.importance = TRUE
library(ranger)
rf.iris <- ranger(Species ~ ., iris, importance = "permutation",
local.importance = TRUE)
rf.iris$variable.importance.local
Then you can
library(data.table)
as.data.table(rf.iris$variable.importance.local)[,Species := iris$Species][,lapply(.SD,mean),by=Species]
Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1: setosa 0.01316 0.00252 0.11192 0.12548
2: versicolor 0.00800 0.00120 0.10672 0.11112
3: virginica 0.01352 0.00316 0.10632 0.09956
Refs:
LocalImp parameter do in randomForests package?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With