Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using adaboost within R's caret package

I've been using the ada R package for a while, and more recently, caret. According to the documentation, caret's train() function should have an option that uses ada. But, caret is puking at me when I use the same syntax that sits within my ada() call.

Here's a demonstration, using the wine sample data set.

library(doSNOW)
registerDoSNOW(makeCluster(2, type = "SOCK"))
library(caret)
library(ada)

wine = read.csv("http://www.nd.edu/~mclark19/learn/data/goodwine.csv")


set.seed(1234) #so that the indices will be the same when re-run
trainIndices = createDataPartition(wine$good, p = 0.8, list = F)
wanted = !colnames(wine) %in% c("free.sulfur.dioxide", "density", "quality",
                            "color", "white")

wine_train = wine[trainIndices, wanted]
wine_test = wine[-trainIndices, wanted]
cv_opts = trainControl(method="cv", number=10)


 ###now, the example that works using ada() 

 results_ada <- ada(good ~ ., data=wine_train, control=rpart.control
 (maxdepth=30, cp=0.010000, minsplit=20, xval=10), iter=500)

##this works, and gives me a confusion matrix.

results_ada
     ada(good ~ ., data = wine_train, control = rpart.control(maxdepth = 30, 
     cp = 0.01, minsplit = 20, xval = 10), iter = 500)
     Loss: exponential Method: discrete   Iteration: 500 
      Final Confusion Matrix for Data:
      Final Prediction
      etc. etc. etc. etc.

##Now, the calls that don't work. 

results_ada = train(good~., data=wine_train, method="ada",
control=rpart.control(maxdepth=30, cp=0.010000, minsplit=20, 
xval=10), iter=500)
   Error in train.default(x, y, weights = w, ...) : 
   final tuning parameters could not be determined
   In addition: Warning messages:
   1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,  :
    There were missing values in resampled performance measures.
   2: In train.default(x, y, weights = w, ...) :
    missing values found in aggregated results

 ###this doesn't work, either

results_ada = train(good~., data=wine_train, method="ada", trControl=cv_opts,
maxdepth=10, nu=0.1, iter=50)

  Error in train.default(x, y, weights = w, ...) : 
  final tuning parameters could not be determined
  In addition: Warning messages:
  1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method,  :
    There were missing values in resampled performance measures.
  2: In train.default(x, y, weights = w, ...) :
   missing values found in aggregated results

I'm guessing it's that train() wants additional input, but the warning thrown doesn't give me any hints on what's missing. Additionally, I could be missing a dependency, but there's no hint on what should be there....

like image 888
Bryan Avatar asked Oct 11 '13 17:10

Bryan


People also ask

What is the use of caret package in R?

Caret is a one-stop solution for machine learning in R. The R package caret has a powerful train function that allows you to fit over 230 different models using one syntax. There are over 230 models included in the package including various tree-based models, neural nets, deep learning and much more.

Where can I use AdaBoost?

AdaBoost can be used to boost the performance of any machine learning algorithm. It is best used with weak learners. These are models that achieve accuracy just above random chance on a classification problem. The most suited and therefore most common algorithm used with AdaBoost are decision trees with one level.

Can AdaBoost be used for classification?

→ AdaBoost algorithms can be used for both classification and regression problem.


3 Answers

Look up ?train and search for ada you'll see that:

Method Value: ada from package ada with tuning parameters: iter, maxdepth, nu (classification only)

So you must be missing the nu parameter, and the maxdepth parameter.

like image 151
nograpes Avatar answered Sep 28 '22 11:09

nograpes


So this seems to work:

wineTrainInd <- wine_train[!colnames(wine_train) %in% "good"]
wineTrainDep <- as.factor(wine_train$good)

results_ada = train(x = wineTrainInd, y = wineTrainDep, method="ada")

results_ada
Boosted Classification Trees 

5199 samples
   9 predictors
   2 classes: 'Bad', 'Good' 

No pre-processing
Resampling: Bootstrapped (25 reps) 

Summary of sample sizes: 5199, 5199, 5199, 5199, 5199, 5199, ... 

Resampling results across tuning parameters:

  iter  maxdepth  Accuracy  Kappa  Accuracy SD  Kappa SD
  50    1         0.732     0.397  0.00893      0.0294  
  50    2         0.74      0.422  0.00853      0.0187  
  50    3         0.747     0.437  0.00759      0.0171  
  100   1         0.736     0.411  0.0065       0.0172  
  100   2         0.742     0.428  0.0075       0.0173  
  100   3         0.748     0.442  0.00756      0.0158  
  150   1         0.737     0.417  0.00771      0.0184  
  150   2         0.745     0.435  0.00851      0.0198  
  150   3         0.752     0.449  0.00736      0.016   

Tuning parameter 'nu' was held constant at a value of 0.1
Accuracy was used to select the optimal model using  the largest value.
The final values used for the model were iter = 150, maxdepth = 3 and nu
 = 0.1.

And the reason is found in another question:

caret::train: specify model-generation-parameters

I think you passed tuning parameters as arguments, when train is attempting to find optimal tuning parameters itself. You could define a grid of parameters for a grid search if you did want to define your own.

like image 27
TomR Avatar answered Sep 28 '22 11:09

TomR


What is the type of data in wine$good? If it is a factor, try explicitly mentioning that it is so:

wine$good <- as.factor(wine$factor)
stopifnot(is.factor(wine$good))

Reason : often, R packages need some help in distinguishing classification vs. regression scenarios, and there may be some generic code inside caret which may be mistakenly identifying the exercise as a regression problem (,ignoring the fact that ada does only classification).

like image 20
vijucat Avatar answered Sep 28 '22 12:09

vijucat