Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between .score() and .predict in the sklearn library?

I have instantiated a SVC object using the sklearn library with the following code:

clf = svm.SVC(kernel='linear', C=1, cache_size=1000, max_iter = -1, verbose = True)

I then fit data to it using:

model = clf.fit(X_train, y_train)

Where X_train is a (301,60) and y_train is (301,) ndarray (y_train consisting of class labels "1", "2" and "3").

Now, before I stumbled across the .score() method, to determine the accuracy of my model on the training set i was using the following:

prediction = np.divide((y_train == model.predict(X_train)).sum(), y_train.size, dtype = float)

which gives a result of approximately 62%.

However, when using the model.score(X_train, y_train) method I get a result of approximately 83%.

Therefore, I was wondering if anyone could explain to me why this should be the case because as far as I understand, they should return the same result?

ADDENDUM:

The first 10 values of y_true are:

  • 2, 3, 1, 3, 2, 3, 2, 2, 3, 1, ...

Whereas for y_pred (when using model.predict(X_train)), they are:

  • 2, 3, 3, 2, 2, 3, 2, 3, 3, 3, ...
like image 299
user1182556 Avatar asked Jan 22 '15 16:01

user1182556


People also ask

What is score () in sklearn?

The F1 score is the harmonic mean of precision and recall, as shown below: F1_score = 2 * (precision * recall) / (precision + recall) An F1 score can range between 0 − 1 0-1 0−1, with 0 being the worst score and 1 being the best.

What is the difference between a score recipe and a predict recipe?

The Score Recipe It outputs a dataset containing the model predictions. Note that if in the Flow we first select the dataset to be scored instead of the saved model, we'll find a Predict recipe that allows us to apply a previously created prediction model. This is just a difference of terminology.

What is the difference between score and accuracy score?

In the case of GaussianNB the docs say that its score method: Returns the mean accuracy on the given test data and labels. The accuracy_score method says its return value depends on the setting for the normalize parameter: If False, return the number of correctly classified samples.

What is the difference between predict () and Predict_proba () in Scikit learn?

The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).


1 Answers

Because your y_train is (301, 1) and not (301,) numpy does broadcasting, so

(y_train == model.predict(X_train)).shape == (301, 301)

which is not what you intended. The correct version of your code would be

np.mean(y_train.ravel() == model.predict(X_train))

which will give the same result as

model.score(X_train, y_train)
like image 137
Andreas Mueller Avatar answered Sep 23 '22 14:09

Andreas Mueller