A similar question was asked however the link in the answer points to random forest example, it doesn't seem to work in my case.
Here is an example what I'm trying to do:
gbmGrid <- expand.grid(interaction.depth = c(5, 9),
n.trees = (1:3)*200,
shrinkage = c(0.05, 0.1))
fitControl <- trainControl(
method = "cv",
number = 3,
classProbs = TRUE)
gbmFit <- train(strong~.-Id-PlayerName, data = train[1:10000,],
method = "gbm",
trControl = fitControl,
verbose = TRUE,
tuneGrid = gbmGrid)
gbmFit
Everything goes fine, I get the best parameters. Now if I do the prediction:
predictStrong = predict(gbmFit, newdata=train[11000:50000,])
I get a binary vector of predictions, which is good:
[1] 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 ...
However when I try to get probabilities, I get an error:
predictStrong = predict(gbmFit, newdata=train[11000:50000,], type="prob")
Error in `[.data.frame`(out, , obsLevels, drop = FALSE) :
undefined columns selected
Where seems to be the problem?
Additional info:
traceback()
5: stop("undefined columns selected")
4: `[.data.frame`(out, , obsLevels, drop = FALSE)
3: out[, obsLevels, drop = FALSE]
2: predict.train(gbmFit, newdata = train[11000:50000, ], type = "prob")
1: predict(gbmFit, newdata = train[11000:50000, ], type = "prob")
Versions:
R version 3.1.0 (2014-04-10) -- "Spring Dance"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
caret version: 6.0-29
EDIT:
I've seen this topic as well and I don't get an error about variable names, although I have couple of variable names with underscores, which I assume it's valid, as I use make.names
and get the same names as the original.
colnames(train) == make.names(colnames(train))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
One of the most powerful and popular packages is the caret library, which follows a consistent syntax for data preparation, model building, and model evaluation, making it easy for data science practitioners. Caret stands for classification and regression training and is arguably the biggest project in R.
When class probabilities are requested, train
puts them into a data frame with a column for each class. If the factor levels are not valid variable names, they are automatically changed (e.g. "0"
becomes "X0"
). train
issues a warning in this case that goes something like "At least one of the class levels are not valid R variables names. This may cause errors if class probabilities are generated."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With