I am attempting to use the F1 score to determine which k value maximises the model for its given purpose. The model is made through the train
function in the caret
package.
Example dataset: https://www.kaggle.com/lachster/churndata
My current code includes the following (as the function for f1 score):
f1 <- function(data, lev = NULL, model = NULL) {
precision <- posPredValue(data$pred, data$obs, positive = "pass")
recall <- sensitivity(data$pred, data$obs, positive = "pass")
f1_val <- (2*precision*recall) / (precision + recall)
names(f1_val) <- c("F1")
f1_val
}
The following as train control:
train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3,
summaryFunction = f1, search = "grid")
And the following as my final execution of the train
command:
x <- train(CHURN ~. ,
data = experiment,
method = "knn",
tuneGrid = expand.grid(.k=1:30),
metric = "F1",
trControl = train.control)
Please note that the model is attempting to predict the churn rate from a set of telco customers.
The execution returns the following result: Something is wrong; all the F1 metric values are missing:
F1
Min. : NA
1st Qu.: NA
Median : NA
Mean :NaN
3rd Qu.: NA
Max. : NA
NA's :30
Error in train.default(x, y, weights = w, ...) : Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
EDIT: Thanks to help from missuse my code now looks like the following but returns this error
levels(exp2$CHURN) <- make.names(levels(factor(exp2$CHURN)))
library(mlbench)
train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3,
summaryFunction = prSummary, classProbs = TRUE)
knn_fit <- train(CHURN ~., data = exp2, method = "knn", trControl =
train.control, preProcess = c("center", "scale"), tuneLength = 15, metric = "F")
The error:
Error in trainControl(method = "repeatedcv", number = 10, repeats = 3, :
object 'prSummary' not found
Caret contains a summary function: prSummary
that provides the F1 score Full example:
library(caret)
library(mlbench)
data(Sonar)
train.control <- trainControl(method = "repeatedcv", number = 10, repeats = 3,
summaryFunction = prSummary, classProbs = TRUE)
knn_fit <- train(Class ~., data = Sonar, method = "knn",
trControl=train.control ,
preProcess = c("center", "scale"),
tuneLength = 15,
metric = "F")
knn_fit
#output
k-Nearest Neighbors
208 samples
60 predictor
2 classes: 'M', 'R'
Pre-processing: centered (60), scaled (60)
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 187, 188, 187, 188, 187, 187, ...
Resampling results across tuning parameters:
k AUC Precision Recall F
5 0.3582687 0.7936713 0.9065657 0.8414592
7 0.4985709 0.7758271 0.8883838 0.8239438
9 0.6632328 0.7484092 0.8853535 0.8089210
11 0.7426320 0.7151175 0.8676768 0.7814297
13 0.7388742 0.6883105 0.8646465 0.7641392
15 0.7594436 0.6787983 0.8467172 0.7520524
17 0.7583071 0.6909693 0.8527778 0.7616448
19 0.7702208 0.6913001 0.8585859 0.7644433
21 0.7642698 0.6962528 0.8707071 0.7719442
23 0.7652370 0.6945755 0.8707071 0.7696863
25 0.7606508 0.6929364 0.8707071 0.7683987
27 0.7454728 0.6916762 0.8676768 0.7669464
29 0.7551679 0.6900416 0.8707071 0.7676640
31 0.7603099 0.6935720 0.8828283 0.7749490
33 0.7614621 0.6938805 0.8770202 0.7728923
F was used to select the optimal model using the largest value.
The final value used for the model was k = 5.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With