Controlling the threshold in Logistic Regression in Scikit Learn

2 Answers

There is a little trick that I use, instead of using model.predict(test_data) use model.predict_proba(test_data). Then use a range of values for thresholds to analyze the effects on the prediction;

pred_proba_df = pd.DataFrame(model.predict_proba(x_test)) threshold_list = [0.05,0.1,0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,.7,.75,.8,.85,.9,.95,.99] for i in threshold_list:     print ('\n******** For i = {} ******'.format(i))     Y_test_pred = pred_proba_df.applymap(lambda x: 1 if x>i else 0)     test_accuracy = metrics.accuracy_score(Y_test.as_matrix().reshape(Y_test.as_matrix().size,1),                                            Y_test_pred.iloc[:,1].as_matrix().reshape(Y_test_pred.iloc[:,1].as_matrix().size,1))     print('Our testing accuracy is {}'.format(test_accuracy))      print(confusion_matrix(Y_test.as_matrix().reshape(Y_test.as_matrix().size,1),                            Y_test_pred.iloc[:,1].as_matrix().reshape(Y_test_pred.iloc[:,1].as_matrix().size,1)))

Best!

answered Sep 22 '22 13:09

jazib jamil

Logistic regression chooses the class that has the biggest probability. In case of 2 classes, the threshold is 0.5: if P(Y=0) > 0.5 then obviously P(Y=0) > P(Y=1). The same stands for the multiclass setting: again, it chooses the class with the biggest probability (see e.g. Ng's lectures, the bottom lines).

Introducing special thresholds only affects in the proportion of false positives/false negatives (and thus in precision/recall tradeoff), but it is not the parameter of the LR model. See also the similar question.

answered Sep 21 '22 13:09

Nikita Astrakhantsev

Related questions
                            
                                Python - How to intuit word from abbreviated text using NLP?
                            
                                Scikit-learn confusion matrix
                            
                                muti output regression in xgboost
                            
                                Submitting Assignment on Coursera ML in Octave
                            
                                How to use advanced activation layers in Keras?
                            
                                Use sklearn's GridSearchCV with a pipeline, preprocessing just once
                            
                                Unbalanced classification using RandomForestClassifier in sklearn
                            
                                ValueError: Layer sequential_20 expects 1 inputs, but it received 2 input tensors
                            
                                When should I use support vector machines as opposed to artificial neural networks?
                            
                                Calculate the Cumulative Distribution Function (CDF) in Python
                            
                                How to interpret scikit's learn confusion matrix and classification report?
                            
                                How to graph grid scores from GridSearchCV?
                            
                                Large scale machine learning - Python or Java? [closed]
                            
                                What is the difference between SVC and SVM in scikit-learn?
                            
                                Help Understanding Cross Validation and Decision Trees
                            
                                What makes the distance measure in k-medoid "better" than k-means?
                            
                                Playground for Artificial Intelligence?
                            
                                Dealing with unbalanced datasets in Spark MLlib
                            
                                A guide to convert_imageset.cpp
                            
                                Getting No loop matching the specified signature and casting error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Controlling the threshold in Logistic Regression in Scikit Learn

Tags:

machine-learning

classification

scikit-learn

logistic-regression

London guy

People also ask

2 Answers

jazib jamil

Nikita Astrakhantsev

Recent Activity

Donate For Us