Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Macro VS Micro VS Weighted VS Samples F1 Score

In sklearn.metrics.f1_score, the f1 score has a parameter called "average". What does macro, micro, weighted, and samples mean? Please elaborate, because in the documentation, it was not explained properly. Or simply answer the following:

  1. Why is "samples" best parameter for multilabel classification?
  2. Why is micro best for an imbalanced dataset?
  3. what's the difference between weighted and macro?
like image 364
Code Geek Avatar asked Apr 18 '19 06:04

Code Geek


People also ask

What is macro micro and weighted in F1 score?

Even if it does not identify a single cat picture, it has an accuracy / micro-f1-score of 99%, since 99% of the data was correctly identified as not cat pictures. Trying to put it in a nutshell: Macro is simply the arithmetic mean of the individual scores, while weighted includes the individual sample sizes.

Should I use macro or micro F1 score?

If you have an imbalanced dataset then you should use macro F1 score as this will still reflect true model performance even when the classes are skewed. However, if you have a balanced dataset then micro F1 score could be considered, especially if communicating the results with end users is important.

What does Weighted F1 score mean?

The weighted-averaged F1 score is calculated by taking the mean of all per-class F1 scores while considering each class's support. Support refers to the number of actual occurrences of the class in the dataset.

What is Micro F1 score?

Micro F1-score (short for micro-averaged F1 score) is used to assess the quality of multi-label binary problems. It measures the F1-score of the aggregated contributions of all classes. If you are looking to select a model based on a balance between precision and recall, don't miss out on assessing your F1-scores!


2 Answers

The question is about the meaning of the average parameter in sklearn.metrics.f1_score.

As you can see from the code:

  • average=micro says the function to compute f1 by considering total true positives, false negatives and false positives (no matter of the prediction for each label in the dataset)
  • average=macro says the function to compute f1 for each label, and returns the average without considering the proportion for each label in the dataset.
  • average=weighted says the function to compute f1 for each label, and returns the average considering the proportion for each label in the dataset.
  • average=samples says the function to compute f1 for each instance, and returns the average. Use it for multilabel classification.
like image 76
sentence Avatar answered Sep 23 '22 22:09

sentence


I found a really helpful article explaining the differences more thoroughly and with examples: https://towardsdatascience.com/multi-class-metrics-made-simple-part-ii-the-f1-score-ebe8b2c2ca1

Unfortunately, it doesn't tackle the 'samples' parameter and I did not experiment with multi-label classification yet, so I'm not able to answer question number 1. As for the others:

  1. Where does this information come from? If I understood the differences correctly, micro is not the best indicator for an imbalanced dataset, but one of the worst since it does not include the proportions. As described in the article, micro-f1 equals accuracy which is a flawed indicator for imbalanced data. For example: The classifier is supposed to identify cat pictures among thousands of random pictures, only 1% of the data set consists of cat pictures (imbalanced data set). Even if it does not identify a single cat picture, it has an accuracy / micro-f1-score of 99%, since 99% of the data was correctly identified as not cat pictures.

  2. Trying to put it in a nutshell: Macro is simply the arithmetic mean of the individual scores, while weighted includes the individual sample sizes. I recommend the article for details, I can provide more examples if needed.

I know that the question is quite old, but I hope this helps someone. Please correct me if I'm wrong. I've done some research, but am not an expert.

like image 28
3scapeX Avatar answered Sep 21 '22 22:09

3scapeX