Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikits confusion matrix with cross validation

I am training a svm classifier with cross validation (stratifiedKfold) using the scikits interfaces. For each test set (of k), I get a classification result. I want to have a confusion matrix with all the results. Scikits has a confusion matrix interface: sklearn.metrics.confusion_matrix(y_true, y_pred) My question is how should I accumulate the y_true and y_pred values. They are arrays (numpy). Should I define the size of the arrays based on my k-fold parameter? And for each result I should add the y_true and y-pred to the array ????

like image 994
andreSmol Avatar asked Mar 16 '12 09:03

andreSmol


People also ask

Does cross validation fit multiple models?

Multiple model comparison is also called Cross Model Validation. Here the model refers to completely different algorithms. The idea is to use multiple models constructed from the same training dataset and validated using the same verification dataset to find out the performance of the different models.

What is ACC in confusion matrix?

The most frequently used performance metrics for classification according to these values are accuracy (ACC), precision (P), sensitivity (Sn), specificity (Sp), and F-score values. The calculation of these performance metrics according to the values in the confusion matrix is made according to Eqs.

What does sklearn cross validation do?

Sklearn offers two methods for quick evaluation using cross-validation. cross-val-score returns a list of model scores and cross-validate also reports training times.

Can we use confusion matrix in logistic regression?

Logistic regression is a type of regression we can use when the response variable is binary. One common way to evaluate the quality of a logistic regression model is to create a confusion matrix, which is a 2×2 table that shows the predicted values from the model vs. the actual values from the test dataset.


1 Answers

You can either use an aggregate confusion matrix or compute one for each CV partition and compute the mean and the standard deviation (or standard error) for each component in the matrix as a measure of the variability.

For the classification report, the code would need to be modified to accept 2 dimensional inputs so as to pass the predictions for each CV partitions and then compute the mean scores and std deviation for each class.

like image 76
ogrisel Avatar answered Oct 22 '22 06:10

ogrisel