Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Calculate F1-Score For Multilabel Classification?

I try to calculate the f1_score but I get some warnings for some cases when I use the sklearn f1_score method.

I have a multilabel 5 classes problem for a prediction.

import numpy as np
from sklearn.metrics import f1_score

y_true = np.zeros((1,5))
y_true[0,0] = 1 # => label = [[1, 0, 0, 0, 0]]

y_pred = np.zeros((1,5))
y_pred[:] = 1 # => prediction = [[1, 1, 1, 1, 1]]

result_1 = f1_score(y_true=y_true, y_pred=y_pred, labels=None, average="weighted")

print(result_1) # prints 1.0

result_2 = f1_score(y_true=y_ture, y_pred=y_pred, labels=None, average="weighted")

print(result_2) # prints: (1.0, 1.0, 1.0, None) for precision/recall/fbeta_score/support

When I use average="samples" instead of "weighted" I get (0.1, 1.0, 0.1818..., None). Is the "weighted" option not useful for a multilabel problem or how do I use the f1_score method correctly?

I also get a warning when using average="weighted":

"UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples."

like image 279
KyleReemoN- Avatar asked Oct 13 '17 15:10

KyleReemoN-


People also ask

How do you calculate an F1 score?

F1 Score. The F1 Score is the 2*((precision*recall)/(precision+recall)). It is also called the F Score or the F Measure. Put another way, the F1 score conveys the balance between the precision and the recall.

Can we use F1 score for multiclass classification?

In the Python sci-kit learn library, we can use the F-1 score function to calculate the per class scores of a multi-class classification problem.

How do you measure performance of multi label classification?

Hamming Loss is one of the most well-known multi-label evaluation measures. It takes into account the prediction error (when an incorrect label is predicted) and the missing error (when a relevant label is not predicted), normalized over the total number of classes and the total number of instances (Sorower, 2010).

How is F1 weighted score calculated?

The weighted-averaged F1 score is calculated by taking the mean of all per-class F1 scores while considering each class's support. Support refers to the number of actual occurrences of the class in the dataset.


1 Answers

It works if you slightly add up data:

y_true = np.array([[1,0,0,0], [1,1,0,0], [1,1,1,1]])
y_pred = np.array([[1,0,0,0], [1,1,1,0], [1,1,1,1]])

recall_score(y_true=y_true, y_pred=y_pred, average='weighted')
>>> 1.0
precision_score(y_true=y_true, y_pred=y_pred, average='weighted')
>>> 0.9285714285714286

f1_score(y_true=y_true, y_pred=y_pred, average='weighted')
>>> 0.95238095238095244

The data suggests we have not missed any true positives and have not predicted any false negatives (recall_score equals 1). However, we have predicted one false positive in the second observation that lead to precision_score equal ~0.93.

As both precision_score and recall_score are not zero with weighted parameter, f1_score, thus, exists. I believe your case is invalid due to lack of information in the example.

like image 125
E.Z. Avatar answered Sep 20 '22 23:09

E.Z.