Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using confusion matrix as scoring metric in cross validation in scikit learn

Tags:

I am creating a pipeline in scikit learn,

pipeline = Pipeline([     ('bow', CountVectorizer()),       ('classifier', BernoulliNB()),  ]) 

and computing the accuracy using cross validation

scores = cross_val_score(pipeline,  # steps to convert raw messages      into models                      train_set,  # training data                      label_train,  # training labels                      cv=5,  # split data randomly into 10 parts: 9 for training, 1 for scoring                      scoring='accuracy',  # which scoring metric?                      n_jobs=-1,  # -1 = use all cores = faster                      ) 

How can I report confusion matrix instead of 'accuracy'?

like image 237
user128751 Avatar asked Oct 15 '16 08:10

user128751


People also ask

Does Scikit-learn provide support for cross validation techniques?

To enjoy the benefits of cross-validation you don't have to split the data manually. Sklearn offers two methods for quick evaluation using cross-validation. cross-val-score returns a list of model scores and cross-validate also reports training times.

How can you calculate accuracy using a confusion matrix?

Here are some of the most common performance measures you can use from the confusion matrix. Accuracy: It gives you the overall accuracy of the model, meaning the fraction of the total samples that were correctly classified by the classifier. To calculate accuracy, use the following formula: (TP+TN)/(TP+TN+FP+FN).

How is sklearn accuracy score calculated?

Count the number of matches. Divide it by the number of samples.


1 Answers

You could use cross_val_predict(See the scikit-learn docs) instead of cross_val_score.

instead of doing :

from sklearn.model_selection import cross_val_score scores = cross_val_score(clf, x, y, cv=10) 

you can do :

from sklearn.model_selection import cross_val_predict from sklearn.metrics import confusion_matrix y_pred = cross_val_predict(clf, x, y, cv=10) conf_mat = confusion_matrix(y, y_pred) 
like image 180
Xema Avatar answered Sep 22 '22 13:09

Xema