Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between predict_proba and decision_function in scikit-learn?

Tags:

scikit-learn

I'm studying a scikit-learn example (Classifier comparison) and got confused with predict_proba and decision_function.

They plot the classification results by drawing the contours using either Z = clf.decision_function(), or Z = clf.predict_proba().

What's the differences between these two? Is it so that each classification method has either of the two as score?

Which one is more proper to interpret the classification result and how should I choose from the two?

like image 820
Rosy Avatar asked Apr 11 '16 08:04

Rosy


People also ask

What is the difference between predict () and predict_proba () in Scikit learn?

The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).

What does model predict_proba () do in Sklearn?

predict_proba gives you the probabilities for the target (0 and 1 in your case) in array form. The number of probabilities for each row is equal to the number of categories in target variable (2 in your case).

What is the output of predict_proba?

predict_proba(X_input) , each row in output consists of 2 columns corresponding to probability of each class.

What is decision function in Sklearn?

Decision function is a method present in classifier{ SVC, Logistic Regression } class of sklearn machine learning framework.


2 Answers

The latter, predict_proba is a method of a (soft) classifier outputting the probability of the instance being in each of the classes.

The former, decision_function, finds the distance to the separating hyperplane. For example, a(n) SVM classifier finds hyperplanes separating the space into areas associated with classification outcomes. This function, given a point, finds the distance to the separators.

I'd guess that predict_prob is more useful in your case, in general - the other method is more specific to the algorithm.

like image 119
Ami Tavory Avatar answered Oct 05 '22 00:10

Ami Tavory


Your example is

if hasattr(clf, "decision_function"):     Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) else:     Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] 

so the code uses decision_function if it exists. On the SVM case, predict_proba is computed (in the binary case)

using Platt scaling

which is both "expensive" and has "theoretical issues". That's why decision_function is used here. (as @Ami said, this is the margin - the distance to the hyperplane, which is accessible without much further computation). In the SVM case, it is advised to

use decision_function instead of predict_proba.

There are other decision_functions: SGDClassifier's. Here, predict_proba depends on the loss function, and decision_function is universally available.

like image 31
serv-inc Avatar answered Oct 04 '22 23:10

serv-inc