I have instantiated a SVC object using the sklearn library with the following code:
clf = svm.SVC(kernel='linear', C=1, cache_size=1000, max_iter = -1, verbose = True)
I then fit data to it using:
model = clf.fit(X_train, y_train)
Where X_train is a (301,60) and y_train is (301,) ndarray (y_train consisting of class labels "1", "2" and "3").
Now, before I stumbled across the .score() method, to determine the accuracy of my model on the training set i was using the following:
prediction = np.divide((y_train == model.predict(X_train)).sum(), y_train.size, dtype = float)
which gives a result of approximately 62%.
However, when using the model.score(X_train, y_train) method I get a result of approximately 83%.
Therefore, I was wondering if anyone could explain to me why this should be the case because as far as I understand, they should return the same result?
ADDENDUM:
The first 10 values of y_true are:
Whereas for y_pred (when using model.predict(X_train)), they are:
The F1 score is the harmonic mean of precision and recall, as shown below: F1_score = 2 * (precision * recall) / (precision + recall) An F1 score can range between 0 − 1 0-1 0−1, with 0 being the worst score and 1 being the best.
The Score Recipe It outputs a dataset containing the model predictions. Note that if in the Flow we first select the dataset to be scored instead of the saved model, we'll find a Predict recipe that allows us to apply a previously created prediction model. This is just a difference of terminology.
In the case of GaussianNB the docs say that its score method: Returns the mean accuracy on the given test data and labels. The accuracy_score method says its return value depends on the setting for the normalize parameter: If False, return the number of correctly classified samples.
The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).
Because your y_train
is (301, 1)
and not (301,)
numpy does broadcasting, so
(y_train == model.predict(X_train)).shape == (301, 301)
which is not what you intended. The correct version of your code would be
np.mean(y_train.ravel() == model.predict(X_train))
which will give the same result as
model.score(X_train, y_train)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With