Using adaboost within R's caret package

Tags:

I've been using the ada R package for a while, and more recently, caret. According to the documentation, caret's train() function should have an option that uses ada. But, caret is puking at me when I use the same syntax that sits within my ada() call.

Here's a demonstration, using the wine sample data set.

library(doSNOW)
registerDoSNOW(makeCluster(2, type = "SOCK"))
library(caret)
library(ada)

wine = read.csv("http://www.nd.edu/~mclark19/learn/data/goodwine.csv")


set.seed(1234) #so that the indices will be the same when re-run
trainIndices = createDataPartition(wine$good, p = 0.8, list = F)
wanted = !colnames(wine) %in% c("free.sulfur.dioxide", "density", "quality",
                            "color", "white")

wine_train = wine[trainIndices, wanted]
wine_test = wine[-trainIndices, wanted]
cv_opts = trainControl(method="cv", number=10)


 ###now, the example that works using ada() 

 results_ada <- ada(good ~ ., data=wine_train, control=rpart.control
 (maxdepth=30, cp=0.010000, minsplit=20, xval=10), iter=500)

##this works, and gives me a confusion matrix.

results_ada
     ada(good ~ ., data = wine_train, control = rpart.control(maxdepth = 30, 
     cp = 0.01, minsplit = 20, xval = 10), iter = 500)
     Loss: exponential Method: discrete   Iteration: 500 
      Final Confusion Matrix for Data:
      Final Prediction
      etc. etc. etc. etc.

##Now, the calls that don't work. 

results_ada = train(good~., data=wine_train, method="ada",
control=rpart.control(maxdepth=30, cp=0.010000, minsplit=20, 
xval=10), iter=500)
   Error in train.default(x, y, weights = w, ...) : 
   final tuning parameters could not be determined
   In addition: Warning messages:
   1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,  :
    There were missing values in resampled performance measures.
   2: In train.default(x, y, weights = w, ...) :
    missing values found in aggregated results

 ###this doesn't work, either

results_ada = train(good~., data=wine_train, method="ada", trControl=cv_opts,
maxdepth=10, nu=0.1, iter=50)

  Error in train.default(x, y, weights = w, ...) : 
  final tuning parameters could not be determined
  In addition: Warning messages:
  1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,  :
    There were missing values in resampled performance measures.
  2: In train.default(x, y, weights = w, ...) :
   missing values found in aggregated results

I'm guessing it's that train() wants additional input, but the warning thrown doesn't give me any hints on what's missing. Additionally, I could be missing a dependency, but there's no hint on what should be there....

888

asked Oct 11 '13 17:10

Bryan

3 Answers

Look up ?train and search for ada you'll see that:

Method Value: ada from package ada with tuning parameters: iter, maxdepth, nu (classification only)

So you must be missing the nu parameter, and the maxdepth parameter.

151

answered Sep 28 '22 11:09

nograpes

So this seems to work:

wineTrainInd <- wine_train[!colnames(wine_train) %in% "good"]
wineTrainDep <- as.factor(wine_train$good)

results_ada = train(x = wineTrainInd, y = wineTrainDep, method="ada")

results_ada
Boosted Classification Trees 

5199 samples
   9 predictors
   2 classes: 'Bad', 'Good' 

No pre-processing
Resampling: Bootstrapped (25 reps) 

Summary of sample sizes: 5199, 5199, 5199, 5199, 5199, 5199, ... 

Resampling results across tuning parameters:

  iter  maxdepth  Accuracy  Kappa  Accuracy SD  Kappa SD
  50    1         0.732     0.397  0.00893      0.0294  
  50    2         0.74      0.422  0.00853      0.0187  
  50    3         0.747     0.437  0.00759      0.0171  
  100   1         0.736     0.411  0.0065       0.0172  
  100   2         0.742     0.428  0.0075       0.0173  
  100   3         0.748     0.442  0.00756      0.0158  
  150   1         0.737     0.417  0.00771      0.0184  
  150   2         0.745     0.435  0.00851      0.0198  
  150   3         0.752     0.449  0.00736      0.016   

Tuning parameter 'nu' was held constant at a value of 0.1
Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were iter = 150, maxdepth = 3 and nu
 = 0.1.

And the reason is found in another question:

caret::train: specify model-generation-parameters

I think you passed tuning parameters as arguments, when train is attempting to find optimal tuning parameters itself. You could define a grid of parameters for a grid search if you did want to define your own.

answered Sep 28 '22 11:09

TomR

What is the type of data in wine$good? If it is a factor, try explicitly mentioning that it is so:

wine$good <- as.factor(wine$factor)
stopifnot(is.factor(wine$good))

Reason : often, R packages need some help in distinguishing classification vs. regression scenarios, and there may be some generic code inside caret which may be mistakenly identifying the exercise as a regression problem (,ignoring the fact that ada does only classification).

answered Sep 28 '22 12:09

vijucat

Related questions
                            
                                Jekyll Converter for R Markdown
                            
                                Spatial matching of big datasets
                            
                                SSL verification causes RCurl and httr to break - on a website that should be legit
                            
                                Timing events when session ends
                            
                                How to find the smallest ellipse covering a given fraction of a set of points in R?
                            
                                R Shiny: fast reactive image display
                            
                                How to call R functions from Fortran?
                            
                                Making fitdist plots with ggplot2
                            
                                Error with fread in R--embedded nul in string: '\0'
                            
                                rmarkdown: how to use multiple bibliographies for a document
                            
                                R memory not released in Windows
                            
                                Avoiding axis tick label collision in faceted ggplots
                            
                                How to escape Athena database.table using pool package?
                            
                                Calculations in R, visualisations in Tableau - how to properly integrate these two?
                            
                                Build an URL with parameters in R
                            
                                R Shiny app on Azure App Services with Active Directory Integration
                            
                                R ggplot heatmap using geom_tile(): how to sort by year and show all years in y-axis?
                            
                                Teach Notepad++ to fold new multiline comment (R)
                            
                                Modifying package gbm of R
                            
                                how to change the width of stat_boxplot (errorbar)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using adaboost within R's caret package

Tags:

r

machine-learning

classification

data-mining

adaboost