Understanding decision_function values

Tags:

I'm currently in the middle of my first machine-learning and so far I don't quite get the scale of the values that I get from decision_function(X)(Nor how to understand them).

Based on the sklearn documentation decision_function(X) is meant to:

Predict confidence scores for samples.

Nonetheless, when running the following script:

from sklearn.datasets import fetch_mldata
import numpy as np
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix , precision_score, recall_score


mnist = fetch_mldata('MNIST original')

classifier = SGDClassifier(random_state = 42, max_iter = 5)


X,y = mnist["data"], mnist["target"]
some_digit = X[36001]
some_digit_image = some_digit.reshape(28, 28)

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

random_order = np.random.permutation(60000)

X_train, y_train = X_train[random_order], y_train[random_order]

y_test_5 = (y_test == 5)
y_train_5 = (y_train == 5)


classifier.fit(X_train, y_train_5)
print(classifier.decision_function([X_test[1]]))

it prints out [-289809.39489525] for the decision_function at this point I'm not sure how to read nor how to evaluate these values (I was expecting to see percentages). If anyone could explain to me what these readings mean that would be greatly appreciated.

Thank you very much in advance.

833

asked Jun 21 '18 22:06

Nazim Kerimbekov

1 Answers

How to get probabilities (percentages)?

Use the predict_proba method.

What is decision_function ?

Since the SGDClassifier is a linear model, the decision_function outputs a signed distance to the separating hyperplane. This number is simply <w,x> + b or translated to scikit-learn attribute names <coef_,x> + intercept_.

115

answered Oct 06 '22 08:10

Jan K

Related questions
                            
                                Using Scrapy in Jupyter notebook / accessing response directly
                            
                                Joining a large and a massive spark dataframe
                            
                                How to get results from custom loss function in Keras?
                            
                                How does Keras ImageDataGenerator rescale parameter works?
                            
                                python non blocking write csv file
                            
                                Convert virtualenv instance/`requirements.txt` to pipenv
                            
                                Python How to keep MessageboxW on top of all other windows?
                            
                                Calculation of xlogx with numpy
                            
                                Maximum Product of Three Numbers
                            
                                How to implement a comment feature that works with multiple selections in QScintilla?
                            
                                Getting features in RFECV scikit-learn
                            
                                Predict label of text with multi-layered perceptron model in Tensorflow
                            
                                How to create a conda environment shortcut on Windows
                            
                                Pandas equivalent of SQL non-equi JOIN
                            
                                Python representation for a set of non-overlapping integer ranges
                            
                                What is the fastest way to XOR A LOT of binary arrays in python?
                            
                                Is it possible to restore corrupted “interned” bytes-objects
                            
                                Changing font family in OpenCV Python using PIL
                            
                                Python - Pickle Spacy for PySpark
                            
                                Tensorflow object detection API killed - OOM. How to reduce shuffle buffer size?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding decision_function values

Tags:

python

machine-learning

scikit-learn

Nazim Kerimbekov

People also ask

1 Answers

Jan K

Recent Activity

Donate For Us