Here is my code:
set.seed(1)
#Boruta on the HouseVotes84 data from mlbench
library(mlbench) #has HouseVotes84 data
library(h2o) #has rf
#spin up h2o
myh20 <- h2o.init(nthreads = -1)
#read in data, throw some away
data(HouseVotes84)
hvo <- na.omit(HouseVotes84)
#move from R to h2o
mydata <- as.h2o(x=hvo,
destination_frame= "mydata")
#RF columns (input vs. output)
idxy <- 1
idxx <- 2:ncol(hvo)
#split data
splits <- h2o.splitFrame(mydata,
c(0.8,0.1))
train <- h2o.assign(splits[[1]], key="train")
valid <- h2o.assign(splits[[2]], key="valid")
# make random forest
my_imp.rf<- h2o.randomForest(y=idxy,x=idxx,
training_frame = train,
validation_frame = valid,
model_id = "my_imp.rf",
ntrees=200)
# find importance
my_varimp <- h2o.varimp(my_imp.rf)
my_varimp
The output that I am getting is "variable importance".
The classic measures are "mean decrease in accuracy" and "mean decrease in gini coefficient".
My results are:
> my_varimp
Variable Importances:
variable relative_importance scaled_importance percentage
1 V4 3255.193604 1.000000 0.410574
2 V5 1131.646484 0.347643 0.142733
3 V3 921.106567 0.282965 0.116178
4 V12 759.443176 0.233302 0.095788
5 V14 492.264954 0.151224 0.062089
6 V8 342.811554 0.105312 0.043238
7 V11 205.392654 0.063097 0.025906
8 V9 191.110046 0.058709 0.024105
9 V7 169.117676 0.051953 0.021331
10 V15 135.097076 0.041502 0.017040
11 V13 114.906586 0.035299 0.014493
12 V2 51.939777 0.015956 0.006551
13 V10 46.716656 0.014351 0.005892
14 V6 44.336708 0.013620 0.005592
15 V16 34.779987 0.010684 0.004387
16 V1 32.528778 0.009993 0.004103
From this my relative importance of "Vote #4" aka V4, is ~3255.2.
Questions: What units is that in? How is that derived?
I tried looking in documentation, but am not finding the answer. I tried the help documentation. I tried using Flow to look at parameters to see if anything in there indicated it. In none of them do I find "gini" or "decrease accuracy". Where should I look?
The answer is in the docs.
[ In the left pane, click on "Algorithms", then "Supervised", then "DRF". The FAQ section answers this question. ]
For convenience, the answer is also copied and pasted here:
"How is variable importance calculated for DRF? Variable importance is determined by calculating the relative influence of each variable: whether that variable was selected during splitting in the tree building process and how much the squared error (over all trees) improved as a result."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With