I am using an <code>rpart</code> classifier in R. The question is - I would want to test the trained classifier on a test data. This is fine - I can use the <code>predict.rpart</code> function. But I also want to calculate precision, recall and F1 score. My question is - do I have to write functions for those myself, or is there any function in R or any of CRAN libraries for that?

using the caret package: <pre class="prettyprint"><code>library(caret) y <- ... # factor of positive / negative cases predictions <- ... # factor of predictions precision <- posPredValue(predictions, y, positive="1") recall <- sensitivity(predictions, y, positive="1") F1 <- (2 * precision * recall) / (precision + recall) </code></pre> A generic function that works for binary and multi-class classification without using any package is: <pre class="prettyprint"><code>f1_score <- function(predicted, expected, positive.class="1") { predicted <- factor(as.character(predicted), levels=unique(as.character(expected))) expected <- as.factor(expected) cm = as.matrix(table(expected, predicted)) precision <- diag(cm) / colSums(cm) recall <- diag(cm) / rowSums(cm) f1 <- ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall)) #Assuming that F1 is zero when it's not possible compute it f1[is.na(f1)] <- 0 #Binary F1 or Multi-class macro-averaged F1 ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1)) } </code></pre> Some comments about the function: <ul> <li>It's assumed that an F1 = NA is zero </li> <li> <code>positive.class</code> is used only in binary f1 </li> <li>for multi-class problems, the macro-averaged F1 is computed</li> <li>If <code>predicted</code> and <code>expected</code> had different levels, <code>predicted</code> will receive the <code>expected</code> levels</li> </ul>

The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de): <pre class="prettyprint"><code>library (ROCR); ... y <- ... # logical array of positive / negative cases predictions <- ... # array of predictions pred <- prediction(predictions, y); # Recall-Precision curve RP.perf <- performance(pred, "prec", "rec"); plot (RP.perf); # ROC curve ROC.perf <- performance(pred, "tpr", "fpr"); plot (ROC.perf); # ROC area under the curve auc.tmp <- performance(pred,"auc"); auc <- as.numeric(auc.tmp@y.values) ... </code></pre>

Easy way of counting precision, recall and F1-score in R

2 Answers

using the caret package:

library(caret)  y <- ... # factor of positive / negative cases predictions <- ... # factor of predictions  precision <- posPredValue(predictions, y, positive="1") recall <- sensitivity(predictions, y, positive="1")  F1 <- (2 * precision * recall) / (precision + recall)

A generic function that works for binary and multi-class classification without using any package is:

f1_score <- function(predicted, expected, positive.class="1") {     predicted <- factor(as.character(predicted), levels=unique(as.character(expected)))     expected  <- as.factor(expected)     cm = as.matrix(table(expected, predicted))      precision <- diag(cm) / colSums(cm)     recall <- diag(cm) / rowSums(cm)     f1 <-  ifelse(precision + recall == 0, 0, 2 * precision * recall / (precision + recall))      #Assuming that F1 is zero when it's not possible compute it     f1[is.na(f1)] <- 0      #Binary F1 or Multi-class macro-averaged F1     ifelse(nlevels(expected) == 2, f1[positive.class], mean(f1)) }

Some comments about the function:

It's assumed that an F1 = NA is zero
positive.class is used only in binary f1
for multi-class problems, the macro-averaged F1 is computed
If predicted and expected had different levels, predicted will receive the expected levels

166

answered Sep 23 '22 12:09

Adriano Rivolli

The ROCR library calculates all these and more (see also http://rocr.bioinf.mpi-sb.mpg.de):

library (ROCR); ...  y <- ... # logical array of positive / negative cases predictions <- ... # array of predictions  pred <- prediction(predictions, y);  # Recall-Precision curve              RP.perf <- performance(pred, "prec", "rec");  plot (RP.perf);  # ROC curve ROC.perf <- performance(pred, "tpr", "fpr"); plot (ROC.perf);  # ROC area under the curve auc.tmp <- performance(pred,"auc"); auc <- as.numeric([email protected])  ...

answered Sep 22 '22 12:09

Itamar

Related questions
                            
                                How to adjust facet size manually
                            
                                R: How to filter/subset a sequence of dates
                            
                                Delete columns/rows with more than x% missing
                            
                                How to transpose a dataframe in tidyverse?
                            
                                How do I strip dollar signs ($) from data/ escape special characters in R?
                            
                                linear regression "NA" estimate just for last coefficient
                            
                                Is there a way to knitr markdown straight out of your workspace using RStudio?
                            
                                Create new column with dplyr mutate and substring of existing column
                            
                                Change plot title sizes in a facet_wrap multiplot
                            
                                Use filter in dplyr conditional on an if statement in R
                            
                                Saving and loading data.frames [duplicate]
                            
                                How to access to specify file in subfolder without change working directory In R?
                            
                                Install binary zipped R package via command line
                            
                                Check whether two vectors contain the same (unordered) elements in R
                            
                                How to remove duplicated column names in R?
                            
                                Transpose / reshape dataframe without "timevar" from long to wide format
                            
                                Add (subtract) months without exceeding the last day of the new month
                            
                                Should I avoid programming packages with pipe operators?
                            
                                Count unique values for every column
                            
                                Replacing occurrences of a number in multiple columns of data frame with another value in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Easy way of counting precision, recall and F1-score in R

Tags:

r

classification

precision-recall

auc

Karel Bílek

People also ask

2 Answers

Adriano Rivolli

Itamar

Recent Activity

Donate For Us