I am facing a difficulty with filtering out the least important variables in my model. I received a set of data with more than 4,000 variables, and I have been asked to reduce the number of variables getting into the model.
I did try already two approaches, but I have failed twice.
The first thing I tried was to manually check variable importance after the modelling and based on that removing non significant variables.
# reproducible example data <- iris # artificial class imbalancing data <- iris %>% mutate(Species = as.factor(ifelse(Species == "virginica", "1", "0")))
Everything works fine while using simple
# creating Task task <- TaskClassif$new(id = "score", backend = data, target = "Species", positive = "1") # creating Learner lrn <- lrn("classif.xgboost") # setting scoring as prediction type lrn$predict_type = "prob" lrn$train(task) lrn$importance() Petal.Width Petal.Length 0.90606304 0.09393696
The issue is that the data is highly imbalanced, so I decided to use
PipeOp operator to undersample majority group which is then passed to
I did skip some part of the code which I believe is not important for this case, things like search space, terminator, tuner etc.
# undersampling po_under <- po("classbalancing", id = "undersample", adjust = "major", reference = "major", shuffle = FALSE, ratio = 1 / 2) # combine learner with pipeline graph lrn_under <- GraphLearner$new(po_under %>>% lrn) # setting the autoTuner at <- AutoTuner$new( learner = lrn_under, resampling = resample, measure = measure, search_space = ps_under, terminator = terminator, tuner = tuner ) at$train(task)
The problem right know is that despite the importance property being still visable within
$importance() in unavailable.
> at <AutoTuner:undersample.classif.xgboost.tuned> * Model: list * Parameters: list() * Packages: - * Predict Type: prob * Feature types: logical, integer, numeric, character, factor, ordered, POSIXct * Properties: featureless, importance, missings, multiclass, oob_error, selected_features, twoclass, weights
So I decided to change my approach and try to add filtering into a
Learner. And that's where I've failed even more. I have started by looking into this mlr3book blog - https://mlr3book.mlr-org.com/fs.html. I tried to add
importance = "impurity" into Learner just like in the blog but id did yield an error.
> lrn <- lrn("classif.xgboost", importance = "impurity") Błąd w poleceniu 'instance[[nn]] <- dots[[i]]': nie można zmienić wartości zablokowanego połączenia dla 'importance'
Which basically means something like this:
Error in 'instance[[nn]] <- dots[[i]]': can't change value of blocked connection for 'importance'
I did also try to workaround with
PipeOp filtering but it also failed miserably. I believe I won't be able to do it without
importance = "impurity".
So my question is, is there a way to achieve what I am aiming for?
In addition I would be greatly thankful for explaining why is filtering by importance possible before modeling? Shouldn't it be based on the model result?
The reason why you can't access
$importance of the
at variable is that it is an
AutoTuner, which does not directly offer variable importance and only "wraps" around the actual
Learner being tuned.
GraphLearner is saved inside your
# get the trained GraphLearner, with tuned hyperparameters graphlearner <- at$learner
This object also does not have
$importance(). (Theoretically, a
GraphLearner could contain more than one
Learner and then it wouldn't even know which importance to give!).
Getting the actual
LearnerClassifXgboost object is a bit tedious, unfortunately, because of shortcomings in the "R6" object system used by mlr3:
Learnerand put it into that object
# get the untrained Learner xgboostlearner <- graphlearner$graph$pipeops$classif.xgboost$learner # put the trained model into the Learner xgboostlearner$state <- graphlearner$model$classif.xgboost
Now the importance can be queried
The example from the book that you link to does not work in your case because the book uses the
ranger Learner, while are using
importance = "impurity" is specific to
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!Donate Us With