I'm using the caret package to analyse Random Forest models built using ranger. I can't figure out how to call the train function using the tuneGrid argument to tune the model parameters.
I think I'm calling the tuneGrid argument wrong, but can't figure out why it's wrong. Any help would be appreciated.
data(iris)
library(ranger)
model_ranger <- ranger(Species ~ ., data = iris, num.trees = 500, mtry = 4,
importance = 'impurity')
library(caret)
# my tuneGrid object:
tgrid <- expand.grid(
num.trees = c(200, 500, 1000),
mtry = 2:4
)
model_caret <- train(Species ~ ., data = iris,
method = "ranger",
trControl = trainControl(method="cv", number = 5, verboseIter = T, classProbs = T),
tuneGrid = tgrid,
importance = 'impurity'
)
4 The trainControl Function. The function trainControl generates parameters that further control how models are created, with possible values: method : The resampling method: "boot" , "cv" , "LOOCV" , "LGOCV" , "repeatedcv" , "timeslice" , "none" and "oob" .
# The tuneGrid parameter lets us decide which values the main parameter will take # While tuneLength only limit the number of default parameters to use.
Description. This function sets up a grid of tuning parameters for a number of classification and regression routines, fits each model and calculates a resampling based performance measure.
Caret is a one-stop solution for machine learning in R. The R package caret has a powerful train function that allows you to fit over 230 different models using one syntax. There are over 230 models included in the package including various tree-based models, neural nets, deep learning and much more.
Here is the syntax for ranger in caret:
library(caret)
add .
prior to tuning parameters:
tgrid <- expand.grid(
.mtry = 2:4,
.splitrule = "gini",
.min.node.size = c(10, 20)
)
Only these three are supported by caret and not the number of trees. In train you can specify num.trees and importance:
model_caret <- train(Species ~ ., data = iris,
method = "ranger",
trControl = trainControl(method="cv", number = 5, verboseIter = T, classProbs = T),
tuneGrid = tgrid,
num.trees = 100,
importance = "permutation")
to get variable importance:
varImp(model_caret)
#output
Overall
Petal.Length 100.0000
Petal.Width 84.4298
Sepal.Length 0.9855
Sepal.Width 0.0000
To check if this works set number of trees to 1000+ - the fit will be much slower. After changing importance = "impurity"
:
#output:
Overall
Petal.Length 100.00
Petal.Width 81.67
Sepal.Length 16.19
Sepal.Width 0.00
If it does not work I recommend installing latest ranger from CRAN and caret from git hub:
devtools::install_github('topepo/caret/pkg/caret')
To train the number of trees you can use lapply
with fixed folds created by createMultiFolds
or createFolds
.
EDIT: while the above example works with caret package version 6.0-84, using the names of hyper parameters without dots works as well.
tgrid <- expand.grid(
mtry = 2:4,
splitrule = "gini",
min.node.size = c(10, 20)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With