Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit weighted f1 score calculation and usage

I have a question regarding weighted average in sklearn.metrics.f1_score

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

Second, I heard that weighted-F1 is deprecated, is it true?

Third, how actually weighted-F1 is being calculated, for example

{
    "0": {
        "TP": 2,
        "FP": 1,
        "FN": 0,
        "F1": 0.8
    },
    "1": {
        "TP": 0,
        "FP": 2,
        "FN": 2,
        "F1": -1
    },
    "2": {
        "TP": 1,
        "FP": 1,
        "FN": 2,
        "F1": 0.4
    }
}

How to calculate weighted-F1 of the above example. I though it should be something like (0.8*2/3 + 0.4*1/3)/3, however I was wrong.

like image 763
com Avatar asked Oct 25 '15 06:10

com


People also ask

What is a weighted F1 score in Sklearn?

(4) Weighted Average The weighted-averaged F1 score is calculated by taking the mean of all per-class F1 scores while considering each class's support. Support refers to the number of actual occurrences of the class in the dataset.

What is F1 score Scikit learn?

In Python, the f1_score function of the sklearn. metrics package calculates the F1 score for a set of predicted labels. An F1 score can range between 0 − 1 0-1 0−1, with 0 being the worst score and 1 being the best.

What is the use of F1 score?

The F1-score combines the precision and recall of a classifier into a single metric by taking their harmonic mean. It is primarily used to compare the performance of two classifiers. Suppose that classifier A has a higher recall, and classifier B has higher precision.


1 Answers

First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

I don't have any references, but if you're interested in multi-label classification where you care about precision/recall of all classes, then the weighted f1-score is appropriate. If you have binary classification where you just care about the positive samples, then it is probably not appropriate.

Second, I heard that weighted-F1 is deprecated, is it true?

No, weighted-F1 itself is not being deprecated. Only some aspects of the function interface were deprecated, back in v0.16, and then only to make it more explicit in previously ambiguous situations. (Historical discussion on github or check out the source code and search the page for "deprecated" to find details.)

Third, how actually weighted-F1 is being calculated?

From the documentation of f1_score:

``'weighted'``:
  Calculate metrics for each label, and find their average, weighted
  by support (the number of true instances for each label). This
  alters 'macro' to account for label imbalance; it can result in an
  F-score that is not between precision and recall.

So the average is weighted by the support, which is the number of samples with a given label. Because your example data above does not include the support, it is impossible to compute the weighted f1 score from the information you listed.

like image 70
jakevdp Avatar answered Sep 21 '22 23:09

jakevdp