I've been using the ada
R package for a while, and more recently, caret
. According to the documentation, caret
's train()
function should have an option that uses ada. But, caret is puking at me when I use the same syntax that sits within my ada()
call.
Here's a demonstration, using the wine
sample data set.
library(doSNOW)
registerDoSNOW(makeCluster(2, type = "SOCK"))
library(caret)
library(ada)
wine = read.csv("http://www.nd.edu/~mclark19/learn/data/goodwine.csv")
set.seed(1234) #so that the indices will be the same when re-run
trainIndices = createDataPartition(wine$good, p = 0.8, list = F)
wanted = !colnames(wine) %in% c("free.sulfur.dioxide", "density", "quality",
"color", "white")
wine_train = wine[trainIndices, wanted]
wine_test = wine[-trainIndices, wanted]
cv_opts = trainControl(method="cv", number=10)
###now, the example that works using ada()
results_ada <- ada(good ~ ., data=wine_train, control=rpart.control
(maxdepth=30, cp=0.010000, minsplit=20, xval=10), iter=500)
##this works, and gives me a confusion matrix.
results_ada
ada(good ~ ., data = wine_train, control = rpart.control(maxdepth = 30,
cp = 0.01, minsplit = 20, xval = 10), iter = 500)
Loss: exponential Method: discrete Iteration: 500
Final Confusion Matrix for Data:
Final Prediction
etc. etc. etc. etc.
##Now, the calls that don't work.
results_ada = train(good~., data=wine_train, method="ada",
control=rpart.control(maxdepth=30, cp=0.010000, minsplit=20,
xval=10), iter=500)
Error in train.default(x, y, weights = w, ...) :
final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, :
There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
missing values found in aggregated results
###this doesn't work, either
results_ada = train(good~., data=wine_train, method="ada", trControl=cv_opts,
maxdepth=10, nu=0.1, iter=50)
Error in train.default(x, y, weights = w, ...) :
final tuning parameters could not be determined
In addition: Warning messages:
1: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, :
There were missing values in resampled performance measures.
2: In train.default(x, y, weights = w, ...) :
missing values found in aggregated results
I'm guessing it's that train() wants additional input, but the warning thrown doesn't give me any hints on what's missing. Additionally, I could be missing a dependency, but there's no hint on what should be there....
Caret is a one-stop solution for machine learning in R. The R package caret has a powerful train function that allows you to fit over 230 different models using one syntax. There are over 230 models included in the package including various tree-based models, neural nets, deep learning and much more.
AdaBoost can be used to boost the performance of any machine learning algorithm. It is best used with weak learners. These are models that achieve accuracy just above random chance on a classification problem. The most suited and therefore most common algorithm used with AdaBoost are decision trees with one level.
→ AdaBoost algorithms can be used for both classification and regression problem.
Look up ?train
and search for ada
you'll see that:
Method Value: ada from package ada with tuning parameters: iter, maxdepth, nu (classification only)
So you must be missing the nu
parameter, and the maxdepth
parameter.
So this seems to work:
wineTrainInd <- wine_train[!colnames(wine_train) %in% "good"]
wineTrainDep <- as.factor(wine_train$good)
results_ada = train(x = wineTrainInd, y = wineTrainDep, method="ada")
results_ada
Boosted Classification Trees
5199 samples
9 predictors
2 classes: 'Bad', 'Good'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 5199, 5199, 5199, 5199, 5199, 5199, ...
Resampling results across tuning parameters:
iter maxdepth Accuracy Kappa Accuracy SD Kappa SD
50 1 0.732 0.397 0.00893 0.0294
50 2 0.74 0.422 0.00853 0.0187
50 3 0.747 0.437 0.00759 0.0171
100 1 0.736 0.411 0.0065 0.0172
100 2 0.742 0.428 0.0075 0.0173
100 3 0.748 0.442 0.00756 0.0158
150 1 0.737 0.417 0.00771 0.0184
150 2 0.745 0.435 0.00851 0.0198
150 3 0.752 0.449 0.00736 0.016
Tuning parameter 'nu' was held constant at a value of 0.1
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were iter = 150, maxdepth = 3 and nu
= 0.1.
And the reason is found in another question:
caret::train: specify model-generation-parameters
I think you passed tuning parameters as arguments, when train
is attempting to find optimal tuning parameters itself. You could define a grid of parameters for a grid search if you did want to define your own.
What is the type of data in wine$good
? If it is a factor
, try explicitly mentioning that it is so:
wine$good <- as.factor(wine$factor)
stopifnot(is.factor(wine$good))
Reason : often, R packages need some help in distinguishing classification vs. regression scenarios, and there may be some generic code inside caret which may be mistakenly identifying the exercise as a regression problem (,ignoring the fact that ada does only classification).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With