Scikit-learn Ridge classifier: extracting class probabilities

Tags:

I'm currently using sklearn's Ridge classifier, and am looking to ensemble this classifier with classifiers from sklearn and other libraries. In order to do this, it would be ideal to extract the probability that a given input belongs to each class in a list of classes. Currently, I'm zipping the classes with the output of model.decision_function(x), but this returns the distance from the hyperplane as opposed to a straightforward probability. These distance values vary from around -1 to around 1.

distances = dict(zip(clf.classes_, clf.decision_function(x)[0]))

How can I convert these distances to a more concrete set of probabilities (a series of positive values that sum to 1)? I'm looking for something like clf.predict_proba() that is implemented for the SVC in sklearn.

415

asked Mar 20 '14 15:03

Madison May

3 Answers

The solutions provided here didn't work for me. I think the softmax function is the correct solution, so I extended RidgeClassifierCV class with a predict_proba method similar to LogisticRegressionCV

from sklearn.utils.extmath import softmax
class RidgeClassifierCVwithProba(RidgeClassifierCV):
    def predict_proba(self, X):
        d = self.decision_function(X)
        d_2d = np.c_[-d, d]
        return softmax(d_2d)

147

answered Oct 16 '22 11:10

Emanuel

Further exploration lead to using the softmax function.

d = clf.decision_function(x)[0]
probs = np.exp(d) / np.sum(np.exp(d))

This guarantees a 0-1 bounded distribution that sums to 1.

answered Oct 16 '22 12:10

Madison May

A little look at the source code of predict shows that decision_function is in fact the logit-transform of the actual class probabilities, i.e., if decision funciton is f, then the class probability of class 1 is exp(f) / (1 + exp(f)). This translates to following check in the sklearn source:

    scores = self.decision_function(X)
    if len(scores.shape) == 1:
        indices = (scores > 0).astype(np.int)
    else:
        indices = scores.argmax(axis=1)
    return self.classes_[indices]

If you observe this check, it tells you that if decision function is greater than zero, then predict class 1, otherwise predict class 0 - a classical logit approach.

So, you will have to turn the decision function into something like:

d = clf.decision_function(x)[0]
probs = numpy.exp(d) / (1 + numpy.exp(d))

And then take appropriate zip etc.

answered Oct 16 '22 12:10

Sudeep Juvekar

Related questions
                            
                                Write data to a file in Python
                            
                                What is the big-o notation for the `len()` function in Python? [duplicate]
                            
                                Python : TypeError: can't multiply sequence by non-int of type 'float'
                            
                                list with infinite elments
                            
                                what is the order of looping (for loop) in python dictionary [duplicate]
                            
                                Update and create a multi-dimensional dictionary in Python
                            
                                Pass list to function by value
                            
                                Keep duplicates in a list in Python
                            
                                how to measure the accuracy of knn classifier in python
                            
                                Getting Battery Capacity in Windows with Python
                            
                                Can't pip install virtualenv in OS X 10.8 with brewed python 2.7
                            
                                How to make a list in Python distinct based on a property of the class in the list?
                            
                                Python - Detect if remote computer is on
                            
                                how do I concatenate 3 lists using a list comprehension?
                            
                                Flask SQLAlchemy pagination error
                            
                                Importing flask.ext.wtf
                            
                                Celery Beat: Limit to single task instance at a time
                            
                                python : Split string separated by a pipe symbol "|"
                            
                                Flask-Restful POST fails due CSRF protection of Flask-WTF
                            
                                Pythonic way to pass around many arguments

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit-learn Ridge classifier: extracting class probabilities

Tags:

python

machine-learning

classification

scikit-learn

Madison May

People also ask

3 Answers

Emanuel

Madison May

Sudeep Juvekar

Recent Activity

Donate For Us