Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a bad, decent, good, and excellent F1-measure range?

I understand F1-measure is a harmonic mean of precision and recall. But what values define how good/bad a F1-measure is? I can't seem to find any references (google or academic) answering my question.

like image 388
KubiK888 Avatar asked Apr 19 '16 00:04

KubiK888


People also ask

Is 0.6 A good F1 score?

A binary classification task. Clearly, the higher the F1 score the better, with 0 being the worst possible and 1 being the best. Beyond this, most online sources don't give you any idea of how to interpret a specific F1 score. Was my F1 score of 0.56 good or bad?

How do you interpret F1 scores?

Notice that F1-score takes both precision and recall into account, which also means it accounts for both FPs and FNs. The higher the precision and recall, the higher the F1-score. F1-score ranges between 0 and 1. The closer it is to 1, the better the model.

What is a good precision recall and F1 score?

F1-score when Precision=0.8 and Recall = 0.01 to 1.0 Here precision is fixed at 0.8, while Recall varies from 0.01 to 1.0 as before: Calculating F1-Score when precision is always 0.8 and recall varies from 0.0 to 1.0.


2 Answers

Consider sklearn.dummy.DummyClassifier(strategy='uniform') which is a classifier that make random guesses (a.k.a bad classifier). We can view DummyClassifier as a benchmark to beat, now let's see it's f1-score.

In a binary classification problem, with balanced dataset: 6198 total sample, 3099 samples labelled as 0 and 3099 samples labelled as 1, f1-score is 0.5 for both classes, and weighted average is 0.5:

strategy_uniform

Second example, using DummyClassifier(strategy='constant'), i.e. guessing the same label every time, guessing label 1 every time in this case, average of f1-scores is 0.33, while f1 for label 0 is 0.00:

strategy_constant

I consider these to be bad f1-scores, given the balanced dataset.

PS. summary generated using sklearn.metrics.classification_report

like image 88
Sida Zhou Avatar answered Sep 22 '22 15:09

Sida Zhou


You did not find any reference for f1 measure range because there is not any range. The F1 measure is a combined matrix of precision and recall.

Let's say you have two algorithms, one has higher precision and lower recall. By this observation , you can not tell that which algorithm is better, unless until your goal is to maximize precision.

So, given this ambiguity about how to select superior algorithm among two (one with higher recall and other with higher precision), we use f1-measure to select superior among them.

f1-measure is a relative term that's why there is no absolute range to define how better your algorithm is.

like image 33
saurabh agarwal Avatar answered Sep 22 '22 15:09

saurabh agarwal