Evaluate multiple scores on sklearn cross_val_score

Tags:

I'm trying to evaluate multiple machine learning algorithms with sklearn for a couple of metrics (accuracy, recall, precision and maybe more).

For what I understood from the documentation here and from the source code(I'm using sklearn 0.17), the cross_val_score function only receives one scorer for each execution. So for calculating multiple scores, I have to :

Execute multiple times

Implement my (time consuming and error prone) scorer

I've executed multiple times with this code :

from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.tree import DecisionTreeClassifier from sklearn.cross_validation import  cross_val_score import time from sklearn.datasets import  load_iris  iris = load_iris()  models = [GaussianNB(), DecisionTreeClassifier(), SVC()] names = ["Naive Bayes", "Decision Tree", "SVM"] for model, name in zip(models, names):     print name     start = time.time()     for score in ["accuracy", "precision", "recall"]:         print score,         print " : ",         print cross_val_score(model, iris.data, iris.target,scoring=score, cv=10).mean()     print time.time() - start

And I get this output:

Naive Bayes accuracy  :  0.953333333333 precision  :  0.962698412698 recall  :  0.953333333333 0.0383198261261 Decision Tree accuracy  :  0.953333333333 precision  :  0.958888888889 recall  :  0.953333333333 0.0494720935822 SVM accuracy  :  0.98 precision  :  0.983333333333 recall  :  0.98 0.063080072403

Which is ok, but it's slow for my own data. How can I measure all scores ?

274

asked Mar 08 '16 19:03

Cristiano Araujo

1 Answers

Since the time of writing this post scikit-learn has updated and made my answer obsolete, see the much cleaner solution below

You can write your own scoring function to capture all three pieces of information, however a scoring function for cross validation must only return a single number in scikit-learn (this is likely for compatibility reasons). Below is an example where each of the scores for each cross validation slice prints to the console, and the returned value is just the sum of the three metrics. If you want to return all these values, you're going to have to make some changes to cross_val_score (line 1351 of cross_validation.py) and _score (line 1601 or the same file).

from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.tree import DecisionTreeClassifier from sklearn.cross_validation import  cross_val_score import time from sklearn.datasets import  load_iris from sklearn.metrics import accuracy_score, precision_score, recall_score  iris = load_iris()  models = [GaussianNB(), DecisionTreeClassifier(), SVC()] names = ["Naive Bayes", "Decision Tree", "SVM"]  def getScores(estimator, x, y):     yPred = estimator.predict(x)     return (accuracy_score(y, yPred),              precision_score(y, yPred, pos_label=3, average='macro'),              recall_score(y, yPred, pos_label=3, average='macro'))  def my_scorer(estimator, x, y):     a, p, r = getScores(estimator, x, y)     print a, p, r     return a+p+r  for model, name in zip(models, names):     print name     start = time.time()     m = cross_val_score(model, iris.data, iris.target,scoring=my_scorer, cv=10).mean()     print '\nSum:',m, '\n\n'     print 'time', time.time() - start, '\n\n'

~~Which gives:~~

Naive Bayes 0.933333333333 0.944444444444 0.933333333333 0.933333333333 0.944444444444 0.933333333333 1.0 1.0 1.0 0.933333333333 0.944444444444 0.933333333333 0.933333333333 0.944444444444 0.933333333333 0.933333333333 0.944444444444 0.933333333333 0.866666666667 0.904761904762 0.866666666667 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0  Sum: 2.86936507937    time 0.0249638557434    Decision Tree 1.0 1.0 1.0 0.933333333333 0.944444444444 0.933333333333 1.0 1.0 1.0 0.933333333333 0.944444444444 0.933333333333 0.933333333333 0.944444444444 0.933333333333 0.866666666667 0.866666666667 0.866666666667 0.933333333333 0.944444444444 0.933333333333 0.933333333333 0.944444444444 0.933333333333 1.0 1.0 1.0 1.0 1.0 1.0  Sum: 2.86555555556    time 0.0237860679626    SVM 1.0 1.0 1.0 0.933333333333 0.944444444444 0.933333333333 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.933333333333 0.944444444444 0.933333333333 0.933333333333 0.944444444444 0.933333333333 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0  Sum: 2.94333333333    time 0.043044090271

As of scikit-learn 0.19.0 the solution becomes much easier

from sklearn.model_selection import cross_validate from sklearn.datasets import  load_iris from sklearn.svm import SVC  iris = load_iris() clf = SVC() scoring = {'acc': 'accuracy',            'prec_macro': 'precision_macro',            'rec_micro': 'recall_macro'} scores = cross_validate(clf, iris.data, iris.target, scoring=scoring,                          cv=5, return_train_score=True) print(scores.keys()) print(scores['test_acc'])

Which gives:

['test_acc', 'score_time', 'train_acc', 'fit_time', 'test_rec_micro', 'train_rec_micro', 'train_prec_macro', 'test_prec_macro'] [ 0.96666667  1.          0.96666667  0.96666667  1.        ]

148

answered Sep 28 '22 13:09

piman314

Related questions
                            
                                How do I do a bitwise Not operation in Python?
                            
                                Pandas groupby with bin counts
                            
                                scikit-learn: how to scale back the 'y' predicted result
                            
                                Python Reverse Find in String
                            
                                Is there an official or common knowledge standard minimal interface for a "list-like" object?
                            
                                FSharp runs my algorithm slower than Python
                            
                                Firebase cloud functions using Python?
                            
                                How to apply __str__ function when printing a list of objects in Python
                            
                                function is not defined error in Python
                            
                                How can I classify data with the nearest-neighbor algorithm using Python?
                            
                                Python Class Based Decorator with parameters that can decorate a method or a function
                            
                                How can I get a list of the symbols in a sympy expression?
                            
                                Multiple configuration files with Python ConfigParser
                            
                                Timer for Python game
                            
                                How to replace a double backslash with a single backslash in python?
                            
                                Cancellable threading.Timer in Python
                            
                                How to pass a variable to an exception when raised and retrieve it when excepted?
                            
                                Suppressing output in python subprocess call [duplicate]
                            
                                Difference between IOError and OSError?
                            
                                Get the same hash value for a Pandas DataFrame each time

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Evaluate multiple scores on sklearn cross_val_score

Tags:

python

machine-learning

scikit-learn

Cristiano Araujo

People also ask

1 Answers

piman314

Recent Activity

Donate For Us