scikit weighted f1 score calculation and usage

Tags:

I have a question regarding weighted average in sklearn.metrics.f1_score

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

Second, I heard that weighted-F1 is deprecated, is it true?

Third, how actually weighted-F1 is being calculated, for example

{
    "0": {
        "TP": 2,
        "FP": 1,
        "FN": 0,
        "F1": 0.8
    },
    "1": {
        "TP": 0,
        "FP": 2,
        "FN": 2,
        "F1": -1
    },
    "2": {
        "TP": 1,
        "FP": 1,
        "FN": 2,
        "F1": 0.4
    }
}

How to calculate weighted-F1 of the above example. I though it should be something like (0.8*2/3 + 0.4*1/3)/3, however I was wrong.

763

asked Oct 25 '15 06:10

com

1 Answers

First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

I don't have any references, but if you're interested in multi-label classification where you care about precision/recall of all classes, then the weighted f1-score is appropriate. If you have binary classification where you just care about the positive samples, then it is probably not appropriate.

Second, I heard that weighted-F1 is deprecated, is it true?

No, weighted-F1 itself is not being deprecated. Only some aspects of the function interface were deprecated, back in v0.16, and then only to make it more explicit in previously ambiguous situations. (Historical discussion on github or check out the source code and search the page for "deprecated" to find details.)

Third, how actually weighted-F1 is being calculated?

From the documentation of f1_score:

``'weighted'``:
  Calculate metrics for each label, and find their average, weighted
  by support (the number of true instances for each label). This
  alters 'macro' to account for label imbalance; it can result in an
  F-score that is not between precision and recall.

So the average is weighted by the support, which is the number of samples with a given label. Because your example data above does not include the support, it is impossible to compute the weighted f1 score from the information you listed.

answered Sep 21 '22 23:09

jakevdp

Related questions
                            
                                Why does Spark's Word2Vec return a vector?
                            
                                Multi-label classification Keras metrics
                            
                                How to implement my own ResNet with torch.nn.Sequential in Pytorch?
                            
                                How SelectKBest (chi2) calculates score?
                            
                                Google Colab: Can we restore all the data even after the runtime disconnects?
                            
                                Machine learning regression model predicts same value for every image
                            
                                AttributeError: 'str' object has no attribute 'dim' in pytorch
                            
                                Database of surveillance camera locations
                            
                                Using the Apache Mahout machine learning libraries [closed]
                            
                                What subjects, topics does a computer science graduate need to learn to apply available machine learning frameworks, esp. SVMs
                            
                                Plotting a decision boundary in matlab
                            
                                Simple gradient boosting algorithm
                            
                                Sparse coding in Python [closed]
                            
                                Persisting data in sklearn
                            
                                Probability basics for machine learning [closed]
                            
                                How do I cluster with KL-divergence?
                            
                                what is the difference between the stacking grading, and voting algorithms?
                            
                                Scikit learn - How to use SVM and Random Forest for text classification?
                            
                                Training SVM with variable sized hog descriptors of training images (MATLAB)
                            
                                What is wrong with my Gradient Descent algorithm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scikit weighted f1 score calculation and usage

Tags:

machine-learning

nlp

scikit-learn

precision-recall

com

People also ask

1 Answers

jakevdp

Recent Activity

Donate For Us