I'm studying a scikit-learn example (Classifier comparison) and got confused with <code>predict_proba</code> and <code>decision_function</code>. They plot the classification results by drawing the contours using either <code>Z = clf.decision_function()</code>, or <code>Z = clf.predict_proba()</code>. What's the differences between these two? Is it so that each classification method has either of the two as score? Which one is more proper to interpret the classification result and how should I choose from the two?

Your example is <pre class="prettyprint"><code>if hasattr(clf, "decision_function"): Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) else: Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] </code></pre> so the code uses <code>decision_function</code> if it exists. On the SVM case, <code>predict_proba</code> is computed (in the binary case) <blockquote> using Platt scaling </blockquote> which is both "expensive" and has "theoretical issues". That's why <code>decision_function</code> is used here. (as @Ami said, this is the margin - the distance to the hyperplane, which is accessible without much further computation). In the SVM case, it is advised to <blockquote> use <code>decision_function</code> instead of <code>predict_proba</code>. </blockquote> There are other <code>decision_function</code>s: <code>SGDClassifier'</code>s. Here, <code>predict_proba</code> depends on the loss function, and <code>decision_function</code> is universally available.

What's the difference between predict_proba and decision_function in scikit-learn?

2 Answers

The latter, predict_proba is a method of a (soft) classifier outputting the probability of the instance being in each of the classes.

The former, decision_function, finds the distance to the separating hyperplane. For example, a(n) SVM classifier finds hyperplanes separating the space into areas associated with classification outcomes. This function, given a point, finds the distance to the separators.

I'd guess that predict_prob is more useful in your case, in general - the other method is more specific to the algorithm.

119

answered Oct 05 '22 00:10

Ami Tavory

Your example is

if hasattr(clf, "decision_function"):     Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) else:     Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]

so the code uses decision_function if it exists. On the SVM case, predict_proba is computed (in the binary case)

using Platt scaling

which is both "expensive" and has "theoretical issues". That's why decision_function is used here. (as @Ami said, this is the margin - the distance to the hyperplane, which is accessible without much further computation). In the SVM case, it is advised to

use decision_function instead of predict_proba.

There are other decision_functions: SGDClassifier's. Here, predict_proba depends on the loss function, and decision_function is universally available.

answered Oct 04 '22 23:10

serv-inc

Related questions
                            
                                'verbose' argument in scikit-learn
                            
                                How do I use a TimeSeriesSplit with a GridSearchCV object to tune a model in scikit-learn?
                            
                                Get Confusion Matrix From a Keras Multiclass Model [duplicate]
                            
                                Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead
                            
                                How can I capture return value with Python timeit module?
                            
                                scikit-learn DBSCAN memory usage
                            
                                Using a sparse matrix versus numpy array
                            
                                Efficiently create sparse pivot tables in pandas?
                            
                                Determining the most contributing features for SVM classifier in sklearn
                            
                                scikit-learn return value of LogisticRegression.predict_proba
                            
                                How to get the samples in each cluster?
                            
                                ImportError: cannnot import name 'Imputer' from 'sklearn.preprocessing'
                            
                                How to plot scikit learn classification report?
                            
                                scikit learn output metrics.classification_report into CSV/tab-delimited format
                            
                                What does clf mean in machine learning?
                            
                                Get U, Sigma, V* matrix from Truncated SVD in scikit-learn
                            
                                Efficiently count word frequencies in python
                            
                                Python/Scikit-Learn - Can't handle mix of multiclass and continuous
                            
                                Fitting a scikits.learn.hmm.GaussianHMM to variable length training sequences
                            
                                Convert numpy array type and values from Float64 to Float32

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between predict_proba and decision_function in scikit-learn?

Tags:

scikit-learn

Rosy

People also ask

2 Answers

Ami Tavory

serv-inc

Recent Activity

Donate For Us