Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding decision_function values

I'm currently in the middle of my first machine-learning and so far I don't quite get the scale of the values that I get from decision_function(X)(Nor how to understand them).

Based on the sklearn documentation decision_function(X) is meant to:

Predict confidence scores for samples.

Nonetheless, when running the following script:

from sklearn.datasets import fetch_mldata
import numpy as np
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix , precision_score, recall_score


mnist = fetch_mldata('MNIST original')

classifier = SGDClassifier(random_state = 42, max_iter = 5)


X,y = mnist["data"], mnist["target"]
some_digit = X[36001]
some_digit_image = some_digit.reshape(28, 28)

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

random_order = np.random.permutation(60000)

X_train, y_train = X_train[random_order], y_train[random_order]

y_test_5 = (y_test == 5)
y_train_5 = (y_train == 5)


classifier.fit(X_train, y_train_5)
print(classifier.decision_function([X_test[1]]))

it prints out [-289809.39489525] for the decision_function at this point I'm not sure how to read nor how to evaluate these values (I was expecting to see percentages). If anyone could explain to me what these readings mean that would be greatly appreciated.

Thank you very much in advance.

like image 833
Nazim Kerimbekov Avatar asked Jun 21 '18 22:06

Nazim Kerimbekov


People also ask

What is the decision function?

A decision function is a function which takes a dataset as input and gives a decision as output. What the decision can be depends on the problem at hand. Examples include: Estimation problems: the "decision" is the estimate. Hypothesis testing problems: the decision is to reject or not reject the null hypothesis.

What is the decision function of SVM?

The output of training is a decision function that tells us how close to the line we are (close to the boundary means a low-confidence decision). Positive decision values mean True, Negative decision values mean False. est = svm.


1 Answers

How to get probabilities (percentages)?

Use the predict_proba method.

What is decision_function ?

Since the SGDClassifier is a linear model, the decision_function outputs a signed distance to the separating hyperplane. This number is simply <w,x> + b or translated to scikit-learn attribute names <coef_,x> + intercept_.

like image 115
Jan K Avatar answered Oct 06 '22 08:10

Jan K