Using cross validation and AUC-ROC for a logistic regression model in sklearn

Tags:

I'm using the sklearn package to build a logistic regression model and then evaluate it. Specifically, I want to do so using cross validation, but can't figure out the right way to do so with the cross_val_score function.

According to the documentation and some examples I saw, I need to pass the function the model, the features, the outcome, and a scoring method. However, the AUC doesn't need predictions, it needs probabilities, so it can try different threshold values and calculate the ROC curve based on that. So what's the right approach here? This function has 'roc_auc' as a possible scoring method, so I'm assuming it's compatible with it, I'm just not sure about the right way to use it. Sample code snippet below.

from sklearn.linear_model import LogisticRegression
from sklearn.cross_validation import cross_val_score

features = ['a', 'b', 'c']
outcome = ['d']
X = df[features]
y = df[outcome]
crossval_scores = cross_val_score(LogisticRegression(), X, y, scoring='roc_auc', cv=10)

Basically, I don't understand why I need to pass y to my cross_val_score function here, instead of probabilities calculated using X in a logistic regression model. Does it just do that part on its own?

706

asked May 17 '17 23:05

NeonBlueHair

1 Answers

All supervised learning methods (including logistic regression) need the true y values to fit a model.

After fitting a model, we generally want to:

Make predictions, and
Score those predictions (usually on 'held out' data, such as by using cross-validation)

cross_val_score gives you cross-validated scores of a model's predictions. But to score the predictions it first needs to make the predictions, and to make the predictions it first needs to fit the model, which requires both X and (true) y.

cross_val_score as you note accepts different scoring metrics. So if you chose f1-score for example, the model predictions generated during cross-val-score would be class predictions (from the model's predict() method). And if you chose roc_auc as your metric, the model predictions used to score the model would be probability predictions (from the model's predict_proba() method).

175

answered Oct 14 '22 14:10

Max Power

Related questions
                            
                                Is an import in python considered to be dynamic linking?
                            
                                Difference between scipy.leastsq and scipy.least_squares
                            
                                How to convert a timedelta to a string and back again
                            
                                Renaming columns on DataFrame output of pandas.concat
                            
                                Using Scipy curve_fit with piecewise function
                            
                                Cloning Conda root environment does not clone conda and condo-build
                            
                                Why does shuffling my validation set in Keras change my model's performance?
                            
                                Symbol not found: _sqlite3_enable_load_extension - sqlite installed via homebrew
                            
                                Preserving quotes in ruamel.yaml
                            
                                python numpy: how to construct a big diagonal array(matrix) from two small array
                            
                                Json parsing Python subprocess
                            
                                How to dynamically import modules?
                            
                                Making a list and appending to it in TensorFlow
                            
                                ANSI color lost when using python subprocess [closed]
                            
                                Pandas: How to use LocIndexer?
                            
                                How to remove an data/models from nltk dowloader?
                            
                                What is the meaning of angle brackets in Python?
                            
                                Can I handle multiple asserts within a single Python pytest method?
                            
                                NumPy ndarray.all() vs np.all(ndarray) vs all(ndarray)
                            
                                Python - Getting and setting clipboard data with subprocesses

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using cross validation and AUC-ROC for a logistic regression model in sklearn

Tags:

python

scikit-learn

logistic-regression

roc

cross-validation

NeonBlueHair

People also ask

1 Answers

Max Power

Recent Activity

Donate For Us