Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issues with tuneGrid parameter in random forest

I've been dealing with some extremely imbalanced data and I would like to use stratified sampling to created more balanced random forests

Right now, I'm using the caret package, mainly to for tuning the random forests. So I try to setup a tuneGrid to pass in the mtry and sampsize parameters into caret train method as follows.

mtryGrid <- data.frame(.mtry = 100),.sampsize=80)
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = ctrl,
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               importance = TRUE)

When I run this example, I get the following error

The tuning parameter grid should have columns mtry

I've come across discussions like this suggesting that passing in these parameters in should be possible.

On the other hand, this page suggests that the only parameter that can be passed in is mtry

Can I even pass in sampsize into the random forests via caret?

like image 207
mortonjt Avatar asked Nov 12 '14 02:11

mortonjt


People also ask

How to tune a random forest for different values?

Alternatively, you can also use expand.grid to give the different values of mtry you want to try. By default the only parameter you can tune for a random forest is mtry.

What are the parameters of random forest algorithm?

Parameter Tuning: Mainly, there are three parameters in the random forest algorithm which you should look at (for tuning): ntree - As the name suggests, the number of trees to grow. Larger the tree, it will be more computationally expensive to build models. mtry - It refers to how many variables we should select at a node split.

Why can't I train a random forest with a different mtry?

It looks like there is a bracket issue with your mtryGrid. Alternatively, you can also use expand.grid to give the different values of mtry you want to try. By default the only parameter you can tune for a random forest is mtry. However you can still pass the others parameters to train. But those will have a fix value an so won't be tuned by train.

How do I tune the parameters of an algorithm?

Some algorithms provide tools for tuning the parameters of the algorithm. For example, the random forest algorithm implementation in the randomForest package provides the tuneRF () function that searches for optimal mtry values given your data. You can see that the most accurate value for mtry was 10 with an OOBError of 0.1442308.


1 Answers

It looks like there is a bracket issue with your mtryGrid. Alternatively, you can also use expand.grid to give the different values of mtry you want to try. By default the only parameter you can tune for a random forest is mtry. However you can still pass the others parameters to train. But those will have a fix value an so won't be tuned by train. But you can still ask to use a stratified sample in train. Below is how I would do, assuming that trainY is a boolean variable according which you want to stratify your samples, and that you want samples of size 80 for each category:

mtryGrid <- expand.grid(mtry = 100) # you can put different values for mtry
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = ctrl,
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               strata = factor(trainY),
               sampsize = c(80, 80), 
               importance = TRUE)
like image 120
Garnieje Avatar answered Sep 30 '22 13:09

Garnieje