I am using an rpart
classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the predict.rpart
function.
But I also want to calculate precision, recall and F1 score.
My question is - do I have to write functions for those myself, or is there any function in R or any of CRAN libraries for that?
For example, a perfect precision and recall score would result in a perfect F-Measure score: F-Measure = (2 * Precision * Recall) / (Precision + Recall) F-Measure = (2 * 1.0 * 1.0) / (1.0 + 1.0) F-Measure = (2 * 1.0) / 2.0.
using the caret package:
library(caret) y <- ... # factor of positive / negative cases predictions <- ... # factor of predictions precision <- posPredValue(predictions, y, positive="1") recall <- sensitivity(predictions, y, positive="1") F1 <- (2 * precision * recall) / (precision + recall)
A generic function that works for binary and multi-class classification without using any package is:
f1_score <- function(predicted, expected, positive.class="1") { predicted <- factor(as.character(predicted), levels=unique(as.character(expected))) expected <- as.factor(expected) cm = as.matrix(table(expected, predicted)) precision <- diag(cm) / colSums(cm) recall <- diag(cm) / rowSums(cm) f1 <- ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall)) #Assuming that F1 is zero when it's not possible compute it f1[is.na(f1)] <- 0 #Binary F1 or Multi-class macro-averaged F1 ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1)) }
Some comments about the function:
positive.class
is used only in binary f1 predicted
and expected
had different levels, predicted
will receive the expected
levelsThe ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):
library (ROCR); ... y <- ... # logical array of positive / negative cases predictions <- ... # array of predictions pred <- prediction(predictions, y); # Recall-Precision curve RP.perf <- performance(pred, "prec", "rec"); plot (RP.perf); # ROC curve ROC.perf <- performance(pred, "tpr", "fpr"); plot (ROC.perf); # ROC area under the curve auc.tmp <- performance(pred,"auc"); auc <- as.numeric([email protected]) ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With