using caret package to find optimal parameters of GBM

Tags:

I'm using the R GBM package for boosting to do regression on some biological data of dimensions 10,000 X 932 and I want to know what are the best parameters settings for GBM package especially (n.trees, shrinkage, interaction.depth and n.minobsinnode) when I searched online I found that CARET package on R can find such parameter settings. However, I have difficulty on using the Caret package with GBM package, so I just want to know how to use caret to find the optimal combinations of the previously mentioned parameters ? I know this might seem very typical question, but I read the caret manual and still have difficulty in integrating caret with gbm, especially cause I'm very new to both of these packages

424

asked Mar 25 '13 11:03

DOSMarter

1 Answers

Not sure if you found what you were looking for, but I find some of these sheets less than helpful.

If you are using the caret package, the following describes the required parameters: > getModelInfo()$gbm$parameters

He are some rules of thumb for running GBM:

The interaction.depth is 1, and on most data sets that seems adequate, but on a few I have found that testing the results against odd multiples up the max has given better results. The max value I have seen for this parameter is floor(sqrt(NCOL(training))).
Shrinkage: the smaller the number, the better the predictive value, the more trees required, and the more computational cost. Testing the values on a small subset of data with something like shrinkage = shrinkage = seq(.0005, .05,.0005) can be helpful in defining the ideal value.
n.minobsinnode: default is 10, and generally I don't mess with that. I have tried c(5,10,15,20) on small sets of data, and didn't really see an adequate return for computational cost.
n.trees: the smaller the shrinkage, the more trees you should have. Start with n.trees = (0:50)*50 and adjust accordingly.

Example setup using the caret package:

getModelInfo()$gbm$parameters
library(parallel)
library(doMC)
registerDoMC(cores = 20)
# Max shrinkage for gbm
nl = nrow(training)
max(0.01, 0.1*min(1, nl/10000))
# Max Value for interaction.depth
floor(sqrt(NCOL(training)))
gbmGrid <-  expand.grid(interaction.depth = c(1, 3, 6, 9, 10),
                    n.trees = (0:50)*50, 
                    shrinkage = seq(.0005, .05,.0005),
                    n.minobsinnode = 10) # you can also put something        like c(5, 10, 15, 20)

fitControl <- trainControl(method = "repeatedcv",
                       repeats = 5,
                       preProcOptions = list(thresh = 0.95),
                       ## Estimate class probabilities
                       classProbs = TRUE,
                       ## Evaluate performance using
                       ## the following function
                       summaryFunction = twoClassSummary)

# Method + Date + distribution
set.seed(1)
system.time(GBM0604ada <- train(Outcome ~ ., data = training,
            distribution = "adaboost",
            method = "gbm", bag.fraction = 0.5,
            nTrain = round(nrow(training) *.75),
            trControl = fitControl,
            verbose = TRUE,
            tuneGrid = gbmGrid,
            ## Specify which metric to optimize
            metric = "ROC"))

Things can change depending on your data (like distribution), but I have found the key being to play with gbmgrid until you get the outcome you are looking for. The settings as they are now would take a long time to run, so modify as your machine, and time will allow. To give you a ballpark of computation, I run on a Mac PRO 12 core with 64GB of ram.

179

answered Sep 30 '22 19:09

Shanemeister

Related questions
                            
                                executing cv.glmnet in parallel in R
                            
                                Fastest way to extract hour from time (HH:MM)
                            
                                How do I remove verbs, prepositions, conjunctions etc from my text? [closed]
                            
                                Text labels with background colour in R
                            
                                Explain ungroup() in dplyr
                            
                                Deciding between NumericVector and arma::vec in Rcpp
                            
                                Function that converts a vector of numbers to a vector of standard units
                            
                                Combine data.frames summing up values of identical columns in R
                            
                                Dealing with wrong spelling when matching text strings in R
                            
                                Extracting a random sample of rows in a data.frame with a nested conditional
                            
                                Finding the index of first changes in the elements of a vector
                            
                                Assigning Dates to Fiscal Year
                            
                                spearman correlation by group in R
                            
                                R - min, max and mean of off-diagonal elements in a matrix
                            
                                How to get the longitude and latitude coordinates from a city name and country in R?
                            
                                ggplot2: Change legend symbol
                            
                                Updating R in Windows
                            
                                How to find list of attached data-sets in R?
                            
                                multiply two vectors - I want a scalar but I get a vector?
                            
                                Plotting data from an svm fit - hyperplane

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

using caret package to find optimal parameters of GBM

Tags:

optimization

r

r-caret

gbm

DOSMarter

People also ask

1 Answers

Shanemeister

Recent Activity

Donate For Us