sklearn svm area under ROC less than 0.5 for training data

Q: What does an AUC of 0.5 mean?

This ROC curve has an AUC of 0.5, meaning it ranks a random positive example higher than a random negative example 50% of the time. As such, the corresponding classification model is basically worthless, as its predictive ability is no better than random guessing.

Q: How can I improve my AUC ROC score?

In order to improve AUC, it is overall to improve the performance of the classifier. Several measures could be taken for experimentation. However, it will depend on the problem and the data to decide which measure will work. (1) Feature normalization and scaling.

Q: What is a good ROC AUC score?

The area under the ROC curve (AUC) results were considered excellent for AUC values between 0.9-1, good for AUC values between 0.8-0.9, fair for AUC values between 0.7-0.8, poor for AUC values between 0.6-0.7 and failed for AUC values between 0.5-0.6.

Q: How do you interpret ROC AUC scores?

AUC represents the probability that a random positive (green) example is positioned to the right of a random negative (red) example. AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.

Tags:

python

svm

scikit-learn

roc

I am using sklearn v 0.13.1 svm in order to try and solve a binary classification problem. I use kfold cross validation and compute the area under the roc curve (roc_auc) to test the quality of my model. However, for some folds the roc_auc is less than 0.5, even for the training data. Shouldn't that be impossible? Shouldn't it always be possible for the algorithm to at least reach 0.5 on the data it is being trained on?

Here's my code:

classifier = svm.SVC(kernel='poly', degree=3, probability=True, max_iter=100000)
kf = cross_validation.KFold(len(myData), n_folds=3, indices=False)
for train, test in kf:
    Fit = classifier.fit(myData[train], classVector[train])

    probas_ = Fit.predict_proba(myData[test])
    fpr, tpr, thresholds = roc_curve(classVector[test], probas_[:,1])
    roc_auc = auc(fpr, tpr)

    probas_ = Fit.predict_proba(myData[train])
    fpr2, tpr2, thresholds2 = roc_curve(classVector[train], probas_[:,1])
    roc_auc2 = auc(fpr2, tpr2)

    print "Training auc: ", roc_auc2, " Testing auc: ", roc_auc

The output looks like this:

    Training auc: 0.423920939062  Testing auc: 0.388436883629
    Training auc: 0.525472613736  Testing auc: 0.565581854043
    Training auc: 0.470917930528  Testing auc: 0.259344660194

Is the results of an area under the curve less than 0.5 meaningful? In principle, if both the train and test values are <0.5 I could just invert the prediction for every point, but I am worried somthing is going wrong. I thought that even if I gave it completely random data, the algorithm should reach 0.5 on the training data?

303

asked Feb 05 '14 20:02

user3276811

1 Answers

Indeed you could invert your predictions, and this is why your AUROCs are < 0.5. It is normally not a problem to do so, just make sure to be consistent and either always or never reverse them. Make sure you do that both on the training and test sets.

The reason for this problem could be that the classifier.fit or the roc_curve functions misinterpreted the classVector you passed. It is probably better to fix that instead - read their doc to learn what data they expect exactly. In particular, you didn't specify what label is positive. See the pos_label argument to roc_curve and make sure y_true was properly specified.

However, what is worrisome is that some of your AUROCs are > 0.5 on the training set, and most of them are close to it. It probably means that your classifier performs not much better than random.

169

answered Oct 16 '22 03:10

Calimo

Related questions
                            
                                Python ctypes bitfields
                            
                                How to estimate local tangent plane for 3d points?
                            
                                'Isomorphic' comparison of NetworkX Graph objects instead of the default 'address' comparison
                            
                                Python Zbar DLL load fail
                            
                                Pygame: Can someone help me implement double jumping?
                            
                                lxml python load html string without header and body and add element around targeted elements
                            
                                Mongo Aggregation - Using variables created in $project
                            
                                Google App Engine Search API
                            
                                how to transport an object in twisted?
                            
                                Why is there a difference between binascii.b2a_base64() and base64.b64encode()?
                            
                                pip listing global packages in active virtualenv
                            
                                Removing old egg-info files on Python package setup
                            
                                How to use UUIDs instead of integers in MySQL DB
                            
                                Labelling the edges in a graph with python igraph
                            
                                handle tinymce window with python, selenium and phantomjs
                            
                                How to catch all old-style class exceptions in python?
                            
                                Iterate over values in pandas column containing lists and retrieve only unique values
                            
                                15 Python scripts into one executable?
                            
                                Troubleshooting Latex table from pandas dataframe to_latex()
                            
                                Animating a Quadmesh from pcolormesh with matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With