Different result with roc_auc_score() and auc()

Classifier

model_logit = LogisticRegression(class_weight='auto') model_logit.fit(X_train_ridge, Y_train)

Roc curve

false_positive_rate, true_positive_rate, thresholds = roc_curve(Y_test, clf.predict_proba(xtest)[:,1])

AUC's

auc(false_positive_rate, true_positive_rate) Out[490]: 0.82338034042531527

and

roc_auc_score(Y_test, clf.predict(xtest)) Out[493]: 0.75944737191205602

Somebody can explain this difference ? I thought both were just calculating the area under the ROC curve. Might be because of the imbalanced dataset but I could not figure out why.

Thanks!

360

asked Jul 01 '15 10:07

gowithefloww

1 Answers

AUC is not always area under the curve of a ROC curve. Area Under the Curve is an (abstract) area under some curve, so it is a more general thing than AUROC. With imbalanced classes, it may be better to find AUC for a precision-recall curve.

See sklearn source for roc_auc_score:

def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):     # <...> docstring <...>     def _binary_roc_auc_score(y_true, y_score, sample_weight=None):             # <...> bla-bla <...>              fpr, tpr, tresholds = roc_curve(y_true, y_score,                                             sample_weight=sample_weight)             return auc(fpr, tpr, reorder=True)      return _average_binary_score(         _binary_roc_auc_score, y_true, y_score, average,         sample_weight=sample_weight)

As you can see, this first gets a roc curve, and then calls auc() to get the area.

I guess your problem is the predict_proba() call. For a normal predict() the outputs are always the same:

import numpy as np from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, auc, roc_auc_score  est = LogisticRegression(class_weight='auto') X = np.random.rand(10, 2) y = np.random.randint(2, size=10) est.fit(X, y)  false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict(X)) print auc(false_positive_rate, true_positive_rate) # 0.857142857143 print roc_auc_score(y, est.predict(X)) # 0.857142857143

If you change the above for this, you'll sometimes get different outputs:

false_positive_rate, true_positive_rate, thresholds = roc_curve(y, est.predict_proba(X)[:,1]) # may differ print auc(false_positive_rate, true_positive_rate) print roc_auc_score(y, est.predict(X))

answered Sep 18 '22 23:09

oopcode

Related questions
                            
                                Why do we need to use rabbitmq
                            
                                Why are complex numbers in Python denoted with 'j' instead of 'i'?
                            
                                Association between naming classes and naming their files in python (convention?)
                            
                                Python executables: py2exe or PyInstaller?
                            
                                What's the difference between python shebangs with /usr/bin/env rather than hard-path?
                            
                                SQLite, python, unicode, and non-utf data
                            
                                When and how to use Python's RLock
                            
                                How do I import a Python script from a sibling directory?
                            
                                Multiple variables in SciPy's optimize.minimize
                            
                                Can Python be used for client side web development? [closed]
                            
                                Python equivalent to 'hold on' in Matlab
                            
                                What is a virtualenv, and why should I use one?
                            
                                one-to-many inline select with django admin
                            
                                Indexing Pandas data frames: integer rows, named columns
                            
                                How to put a variable into Python docstring
                            
                                virtualenv, mysql-python, pip: anyone know how? [duplicate]
                            
                                ImportError: No module named - Python
                            
                                unbuffered stdout in python (as in python -u) from within the program [duplicate]
                            
                                Specifying data type in Pandas csv reader
                            
                                Pandas update sql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Different result with roc_auc_score() and auc()

Tags:

python

machine-learning

scikit-learn