I understand F1-measure is a harmonic mean of precision and recall. But what values define how good/bad a F1-measure is? I can't seem to find any references (google or academic) answering my question.

Consider <code>sklearn.dummy.DummyClassifier(strategy='uniform')</code> which is a classifier that make random guesses (a.k.a bad classifier). We can view DummyClassifier as a benchmark to beat, now let's see it's f1-score. In a binary classification problem, with balanced dataset: 6198 total sample, 3099 samples labelled as <code>0</code> and 3099 samples labelled as <code>1</code>, f1-score is <code>0.5</code> for both classes, and weighted average is <code>0.5</code>: <img src="https://i.stack.imgur.com/EFdv5m.png" alt="strategy_uniform"> Second example, using <code>DummyClassifier(strategy='constant')</code>, i.e. guessing the same label every time, guessing label <code>1</code> every time in this case, average of f1-scores is <code>0.33</code>, while f1 for label <code>0</code> is <code>0.00</code>: <img src="https://i.stack.imgur.com/BV5kAm.png" alt="strategy_constant"> I consider these to be bad f1-scores, given the balanced dataset. PS. summary generated using <code>sklearn.metrics.classification_report</code>

What is a bad, decent, good, and excellent F1-measure range?

2 Answers

Consider sklearn.dummy.DummyClassifier(strategy='uniform') which is a classifier that make random guesses (a.k.a bad classifier). We can view DummyClassifier as a benchmark to beat, now let's see it's f1-score.

In a binary classification problem, with balanced dataset: 6198 total sample, 3099 samples labelled as 0 and 3099 samples labelled as 1, f1-score is 0.5 for both classes, and weighted average is 0.5:

strategy_uniform

Second example, using DummyClassifier(strategy='constant'), i.e. guessing the same label every time, guessing label 1 every time in this case, average of f1-scores is 0.33, while f1 for label 0 is 0.00:

strategy_constant

I consider these to be bad f1-scores, given the balanced dataset.

PS. summary generated using sklearn.metrics.classification_report

answered Sep 22 '22 15:09

Sida Zhou

You did not find any reference for f1 measure range because there is not any range. The F1 measure is a combined matrix of precision and recall.

Let's say you have two algorithms, one has higher precision and lower recall. By this observation , you can not tell that which algorithm is better, unless until your goal is to maximize precision.

So, given this ambiguity about how to select superior algorithm among two (one with higher recall and other with higher precision), we use f1-measure to select superior among them.

f1-measure is a relative term that's why there is no absolute range to define how better your algorithm is.

answered Sep 22 '22 15:09

saurabh agarwal

Related questions
                            
                                Socket.IO server performance and bandwidth usage
                            
                                ArrayList vs. Vectors in Java if thread safety isn't a concern
                            
                                Consuming stack traces noticeably slower in Java 11 than Java 8
                            
                                How can I use the script defer attribute for ASP MVC 4 Bundles with Scripts.Render
                            
                                Efficient summation in Python
                            
                                C++ style vs. performance?
                            
                                Django: Increment blog entry view count by one. Is this efficient?
                            
                                Performances of Structs vs Classes
                            
                                Performance of variable expansion vs. sprintf in PHP
                            
                                Convert a hexadecimal string to an integer efficiently in C?
                            
                                What are the rules for the "Ω(n log n) barrier" for sorting algorithms?
                            
                                What is faster- Java or C# (or good old C)? [closed]
                            
                                What Simple Changes Made the Biggest Improvements to Your Delphi Programs [closed]
                            
                                iPhone UIWebView slow loading to local HTML files
                            
                                Extremely slow model load with keras
                            
                                Java GPU programming [closed]
                            
                                Why does web worker performance sharply decline after 30 seconds?
                            
                                What can I do to decrease load times of HTML pages?
                            
                                Java Streams: How to do an efficient "distinct and sort"?
                            
                                In SQLite, do prepared statements really improve performance?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is a bad, decent, good, and excellent F1-measure range?

Tags:

performance

precision

machine-learning

measurement

precision-recall

KubiK888

People also ask

2 Answers

Sida Zhou

saurabh agarwal

Recent Activity

Donate For Us