I have tried many examples with F1 micro and Accuracy in scikit-learn and in all of them, I see that F1 micro is the same as Accuracy. Is this always true?
Script
from sklearn import svm from sklearn import metrics from sklearn.cross_validation import train_test_split from sklearn.datasets import load_iris from sklearn.metrics import f1_score, accuracy_score # prepare dataset iris = load_iris() X = iris.data[:, :2] y = iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # svm classification clf = svm.SVC(kernel='rbf', gamma=0.7, C = 1.0).fit(X_train, y_train) y_predicted = clf.predict(X_test) # performance print "Classification report for %s" % clf print metrics.classification_report(y_test, y_predicted) print("F1 micro: %1.4f\n" % f1_score(y_test, y_predicted, average='micro')) print("F1 macro: %1.4f\n" % f1_score(y_test, y_predicted, average='macro')) print("F1 weighted: %1.4f\n" % f1_score(y_test, y_predicted, average='weighted')) print("Accuracy: %1.4f" % (accuracy_score(y_test, y_predicted)))
Output
Classification report for SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape=None, degree=3, gamma=0.7, kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) precision recall f1-score support 0 1.00 0.90 0.95 10 1 0.50 0.88 0.64 8 2 0.86 0.50 0.63 12 avg / total 0.81 0.73 0.74 30 F1 micro: 0.7333 F1 macro: 0.7384 F1 weighted: 0.7381 Accuracy: 0.7333
F1 micro = Accuracy
In classification tasks for which every test case is guaranteed to be assigned to exactly one class, micro-F is equivalent to accuracy. It won't be the case in multi-label classification.
F1 score vs Accuracy Remember that the F1 score is balancing precision and recall on the positive class while accuracy looks at correctly classified observations both positive and negative.
Accuracy is used when the True Positives and True negatives are more important while F1-score is used when the False Negatives and False Positives are crucial.
Micro F1-score (short for micro-averaged F1 score) is used to assess the quality of multi-label binary problems. It measures the F1-score of the aggregated contributions of all classes. If you are looking to select a model based on a balance between precision and recall, don't miss out on assessing your F1-scores!
In classification tasks for which every test case is guaranteed to be assigned to exactly one class, micro-F is equivalent to accuracy. It won't be the case in multi-label classification.
I had the same issue so I investigated and came up with this:
Just thinking about the theory, it is impossible that accuracy and the f1-score
are the very same for every single dataset. The reason for this is that the f1-score
is independent from the true-negatives while accuracy is not.
By taking a dataset where f1 = acc
and adding true negatives to it, you get f1 != acc
.
>>> from sklearn.metrics import accuracy_score as acc >>> from sklearn.metrics import f1_score as f1 >>> y_pred = [0, 1, 1, 0, 1, 0] >>> y_true = [0, 1, 1, 0, 0, 1] >>> acc(y_true, y_pred) 0.6666666666666666 >>> f1(y_true,y_pred) 0.6666666666666666 >>> y_true = [0, 1, 1, 0, 1, 0, 0, 0, 0] >>> y_pred = [0, 1, 1, 0, 0, 1, 0, 0, 0] >>> acc(y_true, y_pred) 0.7777777777777778 >>> f1(y_true,y_pred) 0.6666666666666666
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With