Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit learn clf.fit / score model accuracy

I'm build a model clf say

clf = MultinomialNB()
clf.fit(x_train, y_train)

then I want to see my model accuracy using score

clf.score(x_train, y_train)

the result was 0.92

My goal is to test against the test so I use

clf.score(x_test, y_test)

This one I got 0.77 , so I thought it would give me the result same as this code below

clf.fit(X_train, y_train).score(X_test, y_test)

This I got 0.54. Can someone help me understand why would 0.77 > 0.54 ?

like image 844
JPC Avatar asked Oct 16 '13 16:10

JPC


People also ask

How does sklearn determine accuracy of a model?

Accuracy using Sklearn's accuracy_score()The accuracy_score() method of sklearn. metrics, accept the true labels of the sample and the labels predicted by the model as its parameters and computes the accuracy score as a float value, which can likewise be used to obtain the accuracy score in Python.

What is CLF fit?

Then, a classifier named clf is defined as an object for our model in the fourth line. The fit method in fifth line fits the training dataset as features (data) and labels (target) into the Naive Bayes' model. The predict method predicts our actual testing dataset with regard to the fitted (training) data.

What is accuracy score in sklearn?

Accuracy score is the portion of samples that were correctly classified, out of the total number of samples, so it ranges from 0 to 1. Save this answer.

How is accuracy calculated in sklearn metrics?

Accuracy score. The accuracy_score function computes the accuracy, either the fraction (default) or the count (normalize=False) of correct predictions. In multilabel classification, the function returns the subset accuracy.


1 Answers

You must get the same result if x_train, y_train, x_test and y_test are the same in both cases. Here is an example using iris dataset, as you can see both methods get the same result.

>>> from sklearn.naive_bayes import MultinomialNB
>>> from sklearn.cross_validation import train_test_split
>>> from sklearn.datasets import load_iris
>>> from copy import copy
# prepare dataset
>>> iris = load_iris()
>>> X = iris.data[:, :2]
>>> y = iris.target
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# model
>>> clf1 = MultinomialNB()
>>> clf2 = MultinomialNB()
>>> print id(clf1), id(clf2) # two different instances
 4337289232 4337289296
>>> clf1.fit(X_train, y_train)
>>> print clf1.score(X_test, y_test)
 0.633333333333
>>> print clf2.fit(X_train, y_train).score(X_test, y_test)
 0.633333333333
like image 135
jabaldonedo Avatar answered Oct 22 '22 07:10

jabaldonedo