Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn .predict() default threshold

I'm working on a classification problem with unbalanced classes (5% 1's). I want to predict the class, not the probability.

In a binary classification problem, is scikit's classifier.predict() using 0.5 by default? If it doesn't, what's the default method? If it does, how do I change it?

In scikit some classifiers have the class_weight='auto' option, but not all do. With class_weight='auto', would .predict() use the actual population proportion as a threshold?

What would be the way to do this in a classifier like MultinomialNB that doesn't support class_weight? Other than using predict_proba() and then calculation the classes myself.

like image 210
ADJ Avatar asked Nov 14 '13 18:11

ADJ


People also ask

What is the default threshold in Sklearn logistic regression?

The logistic regression assigns each row a probability of bring True and then makes a prediction for each row where that prbability is >= 0.5 i.e. 0.5 is the default threshold.

What is predict () Sklearn?

The Sklearn 'Predict' Method Predicts an Output That being the case, it provides a set of tools for doing things like training and evaluating machine learning models. What is this? And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).

How do you determine the optimal threshold of the ROC curve?

ROC curve for finding the optimal thresholdThe X-axis or independent variable is the false positive rate for the predictive test. The Y-axis or dependent variable is the true positive rate for the predictive test. A perfect result would be the point (0, 1) indicating 0% false positives and 100% true positives.

What is the difference between predict () and predict_proba () in Scikit learn?

The predict method is used to predict the actual class while predict_proba method can be used to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).


1 Answers

The threshold can be set using clf.predict_proba()

for example:

from sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(random_state = 2) clf.fit(X_train,y_train) # y_pred = clf.predict(X_test)  # default threshold is 0.5 y_pred = (clf.predict_proba(X_test)[:,1] >= 0.3).astype(bool) # set threshold as 0.3 
like image 122
Yuchao Jiang Avatar answered Oct 01 '22 18:10

Yuchao Jiang