Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tell scikit-learn for which label the F-1/precision/recall score is given (in binary classification)?

As explained in this article, it matters for calculating the F-1 score (that is, for calculating recall and precision) whether those calculations are based on the positive or negative class. For example, if I have a skewed dataset with 1% labels of category A and 99% labels of category B and I am just assigning A the positive category and classify all test items as positive, my F-1 score will be very good. How do I tell scikit-learn which category is the positive category in a binary classification? (If helpful, I can provide code.)

like image 962
You_got_it Avatar asked Dec 15 '15 01:12

You_got_it


1 Answers

For binary classification, sklearn.metrics.f1_score will by default make the assumption that 1 is the positive class, and 0 is the negative class. If you use those conventions (0 for category B, and 1 for category A), it should give you the desired behavior. It is possible to override this behavior by passing the pos_label keyword argument to the f1_score function.

See: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html

like image 153
David Maust Avatar answered Oct 05 '22 22:10

David Maust