Scikit: calculate precision and recall using cross_val

I'm using scikit to perform a logistic regression on spam/ham data. X_train is my training data and y_train the labels('spam' or 'ham') and I trained my LogisticRegression this way:

classifier = LogisticRegression()
classifier.fit(X_train, y_train)

If I want to get the accuracies for a 10 fold cross validation, I just write:

 accuracy = cross_val_score(classifier, X_train, y_train, cv=10)

I thought it was possible to calculate also the precisions and recalls by simply adding one parameter this way:

precision = cross_val_score(classifier, X_train, y_train, cv=10, scoring='precision')
recall = cross_val_score(classifier, X_train, y_train, cv=10, scoring='recall')

But it results in a ValueError:

ValueError: pos_label=1 is not a valid label: array(['ham', 'spam'], dtype='|S4')

Is it related to the data (should I binarize the labels ?) or do they change the cross_val_score function ?

Thank you in advance !

How do you calculate precision and recall in Sklearn?

The precision is intuitively the ability of the classifier not to label a negative sample as positive. The recall is the ratio tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.

What does Sklearn Cross_val_score do?

The cross_val_score() function will be used to perform the evaluation, taking the dataset and cross-validation configuration and returning a list of scores calculated for each fold.

What does Cross_val_score return?

Next, to implement cross validation, the cross_val_score method of the sklearn.model_selection library can be used. The cross_val_score returns the accuracy for all the folds. Values for 4 parameters are required to be passed to the cross_val_score class.

Does Cross_val_score train the model?

Can I train my model using cross_val_score? A common question developers have is whether cross_val_score can also function as a way of training the final model. Unfortunately this is not the case. Cross_val_score is a way of assessing a model and it's parameters, and cannot be used for final training.

To compute the recall and precision, the data has to be indeed binarized, this way:

from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit(y_train)

To go further, i was surprised that I didn't have to binarize the data when I wanted to calculate the accuracy:

accuracy = cross_val_score(classifier, X_train, y_train, cv=10)

It's just because the accuracy formula doesn't really need information about which class is considered as positive or negative: (TP + TN) / (TP + TN + FN + FP). We can indeed see that TP and TN are exchangeable, it's not the case for recall, precision and f1.

Scikit: calculate precision and recall using cross_val_score function

Tags:

python

precision

machine-learning

scikit-learn

logistic-regression

Anil Narassiguin

People also ask

1 Answers

Anil Narassiguin

Recent Activity

Donate For Us

Scikit: calculate precision and recall using cross_val_score function

Tags:

python

precision

machine-learning

scikit-learn

logistic-regression

Anil Narassiguin

People also ask

1 Answers

Anil Narassiguin

Related questions

Recent Activity

Donate For Us