Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit: calculate precision and recall using cross_val_score function

I'm using scikit to perform a logistic regression on spam/ham data. X_train is my training data and y_train the labels('spam' or 'ham') and I trained my LogisticRegression this way:

classifier = LogisticRegression()
classifier.fit(X_train, y_train)

If I want to get the accuracies for a 10 fold cross validation, I just write:

 accuracy = cross_val_score(classifier, X_train, y_train, cv=10)

I thought it was possible to calculate also the precisions and recalls by simply adding one parameter this way:

precision = cross_val_score(classifier, X_train, y_train, cv=10, scoring='precision')
recall = cross_val_score(classifier, X_train, y_train, cv=10, scoring='recall')

But it results in a ValueError:

ValueError: pos_label=1 is not a valid label: array(['ham', 'spam'], dtype='|S4') 

Is it related to the data (should I binarize the labels ?) or do they change the cross_val_score function ?

Thank you in advance !

like image 707
Anil Narassiguin Avatar asked Dec 08 '14 11:12

Anil Narassiguin


People also ask

How do you calculate precision and recall in Sklearn?

The precision is intuitively the ability of the classifier not to label a negative sample as positive. The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

What does Sklearn Cross_val_score do?

The cross_val_score() function will be used to perform the evaluation, taking the dataset and cross-validation configuration and returning a list of scores calculated for each fold.

What does Cross_val_score return?

Next, to implement cross validation, the cross_val_score method of the sklearn.model_selection library can be used. The cross_val_score returns the accuracy for all the folds. Values for 4 parameters are required to be passed to the cross_val_score class.

Does Cross_val_score train the model?

Can I train my model using cross_val_score? A common question developers have is whether cross_val_score can also function as a way of training the final model. Unfortunately this is not the case. Cross_val_score is a way of assessing a model and it's parameters, and cannot be used for final training.


1 Answers

To compute the recall and precision, the data has to be indeed binarized, this way:

from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(y_train)

To go further, i was surprised that I didn't have to binarize the data when I wanted to calculate the accuracy:

accuracy = cross_val_score(classifier, X_train, y_train, cv=10)

It's just because the accuracy formula doesn't really need information about which class is considered as positive or negative: (TP + TN) / (TP + TN + FN + FP). We can indeed see that TP and TN are exchangeable, it's not the case for recall, precision and f1.

like image 199
Anil Narassiguin Avatar answered Sep 23 '22 22:09

Anil Narassiguin