I'm studying a scikit-learn example (Classifier comparison) and got confused with predict_proba and decision_function.
They plot the classification results by drawing the contours using either  Z = clf.decision_function(), or Z = clf.predict_proba(). 
What's the differences between these two? Is it so that each classification method has either of the two as score?
Which one is more proper to interpret the classification result and how should I choose from the two?
The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).
predict_proba gives you the probabilities for the target (0 and 1 in your case) in array form. The number of probabilities for each row is equal to the number of categories in target variable (2 in your case).
predict_proba(X_input) , each row in output consists of 2 columns corresponding to probability of each class.
Decision function is a method present in classifier{ SVC, Logistic Regression } class of sklearn machine learning framework.
The latter, predict_proba is a method of a (soft) classifier outputting the probability of the instance being in each of the classes.
The former, decision_function, finds the distance to the separating hyperplane. For example, a(n) SVM classifier finds hyperplanes separating the space into areas associated with classification outcomes. This function, given a point, finds the distance to the separators.
I'd guess that predict_prob is more useful in your case, in general - the other method is more specific to the algorithm.
Your example is
if hasattr(clf, "decision_function"):     Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) else:     Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]   so the code uses decision_function if it exists. On the SVM case, predict_proba is computed (in the binary case)
using Platt scaling
which is both "expensive" and has "theoretical issues". That's why decision_function is used here. (as @Ami said, this is the margin -  the  distance to the hyperplane, which is accessible without much further computation). In the SVM case, it is advised to 
use
decision_functioninstead ofpredict_proba.
There are other decision_functions: SGDClassifier's. Here, predict_proba depends on the loss function, and decision_function is universally available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With