I'm studying a scikit-learn example (Classifier comparison) and got confused with predict_proba
and decision_function
.
They plot the classification results by drawing the contours using either Z = clf.decision_function()
, or Z = clf.predict_proba()
.
What's the differences between these two? Is it so that each classification method has either of the two as score?
Which one is more proper to interpret the classification result and how should I choose from the two?
The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).
predict_proba gives you the probabilities for the target (0 and 1 in your case) in array form. The number of probabilities for each row is equal to the number of categories in target variable (2 in your case).
predict_proba(X_input) , each row in output consists of 2 columns corresponding to probability of each class.
Decision function is a method present in classifier{ SVC, Logistic Regression } class of sklearn machine learning framework.
The latter, predict_proba
is a method of a (soft) classifier outputting the probability of the instance being in each of the classes.
The former, decision_function
, finds the distance to the separating hyperplane. For example, a(n) SVM classifier finds hyperplanes separating the space into areas associated with classification outcomes. This function, given a point, finds the distance to the separators.
I'd guess that predict_prob
is more useful in your case, in general - the other method is more specific to the algorithm.
Your example is
if hasattr(clf, "decision_function"): Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) else: Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
so the code uses decision_function
if it exists. On the SVM case, predict_proba
is computed (in the binary case)
using Platt scaling
which is both "expensive" and has "theoretical issues". That's why decision_function
is used here. (as @Ami said, this is the margin - the distance to the hyperplane, which is accessible without much further computation). In the SVM case, it is advised to
use
decision_function
instead ofpredict_proba
.
There are other decision_function
s: SGDClassifier'
s. Here, predict_proba
depends on the loss function, and decision_function
is universally available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With