I would like to study the optimal tradeoff between bias/variance for model tuning. I'm using caret for R which allows me to plot the performance metric (AUC, accuracy...) against the hyperparameters of the model (mtry, lambda, etc.) and automatically chooses the max. This typically returns a good model, but if I want to dig further and choose a different bias/variance tradeoff I need a learning curve, not a performance curve. For the sake of simplicity, let's say my model is a random forest, which has just one hyperparameter 'mtry' I would like to plot the learning curves of both training and test sets. Something like this: <img src="https://i.stack.imgur.com/hhwyC.png" alt="learning curve"> (red curve is the test set) On the y axis I put an error metric (number of misclassified examples or something like that); on the x axis 'mtry' or alternatively the training set size. Questions: <ol> <li>Has caret the functionality to iteratively train models based of training set folds different in size? If I have to code by hand, how can I do that?</li> <li>If I want to put the hyperparameter on the x axis, I need all the models trained by caret::train, not just the final model (the one with maximum performance got after CV). Are these "discarded" model still available after train?</li> </ol>

<ol> <li>Caret will iteratively test lots of cv models for you if you set the <code>trainControl()</code> function and the parameters (e.g. mtry) using a <code>tuneGrid()</code>. Both of these are then passed as control options to the <code>train()</code> function. The specifics of the tuneGrid parameters (e.g. mtry, ntree) will be different for each model type.</li> <li>Yes the final <code>trainFit</code> model will contain the error rate (however you specified it) for all folds of your CV.</li> </ol> So you could specify e.g. a 10-fold CV times a grid with 10 values of mtry -which would be 100 iterations. You might want to go get a cup of tea or possibly lunch. If this sounds complicated ... there is a very good example here - caret being one of the best documented packages about.

Plot learning curves with caret package and R

Tags:

plot

r

machine-learning

supervised-learning

I would like to study the optimal tradeoff between bias/variance for model tuning. I'm using caret for R which allows me to plot the performance metric (AUC, accuracy...) against the hyperparameters of the model (mtry, lambda, etc.) and automatically chooses the max. This typically returns a good model, but if I want to dig further and choose a different bias/variance tradeoff I need a learning curve, not a performance curve.

For the sake of simplicity, let's say my model is a random forest, which has just one hyperparameter 'mtry'

I would like to plot the learning curves of both training and test sets. Something like this:

learning curve

(red curve is the test set)

On the y axis I put an error metric (number of misclassified examples or something like that); on the x axis 'mtry' or alternatively the training set size.

Questions:

Has caret the functionality to iteratively train models based of training set folds different in size? If I have to code by hand, how can I do that?
If I want to put the hyperparameter on the x axis, I need all the models trained by caret::train, not just the final model (the one with maximum performance got after CV). Are these "discarded" model still available after train?

601

asked Dec 04 '13 08:12

Gabriele B

1 Answers

Caret will iteratively test lots of cv models for you if you set the trainControl() function and the parameters (e.g. mtry) using a tuneGrid(). Both of these are then passed as control options to the train() function. The specifics of the tuneGrid parameters (e.g. mtry, ntree) will be different for each model type.
Yes the final trainFit model will contain the error rate (however you specified it) for all folds of your CV.

So you could specify e.g. a 10-fold CV times a grid with 10 values of mtry -which would be 100 iterations. You might want to go get a cup of tea or possibly lunch.

If this sounds complicated ... there is a very good example here - caret being one of the best documented packages about.

answered Sep 18 '22 07:09

Stephen Henderson

Related questions
                            
                                Function to save R list into separate Excel worksheets
                            
                                Is there an R package to parse geophysical "Log Ascii Standard" Files (.las files)?
                            
                                what does '[[' mean in the function lapply(x, '[[', VarNames[[type]]) in R?
                            
                                plotting list object using ggplot [closed]
                            
                                R legend for color density scatterplot produced using smoothScatter
                            
                                Adding annotation (segment / arrow) in only certain facet ggplot [duplicate]
                            
                                Use sub-/superscript and special characters in legend texts of R plots
                            
                                Change distance between x-axis ticks in ggplot2
                            
                                2 factor histogram analysis
                            
                                Create a dll dynamic library from C in R (Windows)
                            
                                Data frames with mixed data types
                            
                                Encoding in R like Python ("ord" and "chr")
                            
                                Getting multiple checkbox values in Shiny
                            
                                RODBC connection- limited rows
                            
                                Is it possible to include text files in the 'data' subdirectory of an R package
                            
                                Error in TukeyHSD in R
                            
                                Exclude zero values from a ggplot barplot?
                            
                                How to specify custom return value for Max, Min in R for empty input (instead of the default +Inf and -Inf)?
                            
                                What is the relation between RStudio and RServe?
                            
                                R data.table fread from clipboard

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With