Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit-learn Ridge classifier: extracting class probabilities

I'm currently using sklearn's Ridge classifier, and am looking to ensemble this classifier with classifiers from sklearn and other libraries. In order to do this, it would be ideal to extract the probability that a given input belongs to each class in a list of classes. Currently, I'm zipping the classes with the output of model.decision_function(x), but this returns the distance from the hyperplane as opposed to a straightforward probability. These distance values vary from around -1 to around 1.

distances = dict(zip(clf.classes_, clf.decision_function(x)[0]))  

How can I convert these distances to a more concrete set of probabilities (a series of positive values that sum to 1)? I'm looking for something like clf.predict_proba() that is implemented for the SVC in sklearn.

like image 415
Madison May Avatar asked Mar 20 '14 15:03

Madison May


People also ask

Can I use Ridge regression for classification?

Yes, ridge regression can be used as a classifier, just code the response labels as -1 and +1 and fit the regression model as normal.

What does predict_proba return?

The predict_proba() method In the context of classification tasks, some sklearn estimators also implement the predict_proba method that returns the class probabilities for each data point.

How does a ridge classifier work?

The Ridge Classifier, based on Ridge regression method, converts the label data into [-1, 1] and solves the problem with regression method. The highest value in prediction is accepted as a target class and for multiclass data muilti-output regression is applied.

How do you predict probabilities in Python?

The sklearn library has the predict_proba() command that can be used to generate a two column array, the first column being the probability that the outcome will be 0 and the second being the probability that the outcome will be 1. The sum of each row of the two columns should also equal one.


3 Answers

The solutions provided here didn't work for me. I think the softmax function is the correct solution, so I extended RidgeClassifierCV class with a predict_proba method similar to LogisticRegressionCV

from sklearn.utils.extmath import softmax
class RidgeClassifierCVwithProba(RidgeClassifierCV):
    def predict_proba(self, X):
        d = self.decision_function(X)
        d_2d = np.c_[-d, d]
        return softmax(d_2d)
like image 147
Emanuel Avatar answered Oct 16 '22 11:10

Emanuel


Further exploration lead to using the softmax function.

d = clf.decision_function(x)[0]
probs = np.exp(d) / np.sum(np.exp(d))

This guarantees a 0-1 bounded distribution that sums to 1.

like image 43
Madison May Avatar answered Oct 16 '22 12:10

Madison May


A little look at the source code of predict shows that decision_function is in fact the logit-transform of the actual class probabilities, i.e., if decision funciton is f, then the class probability of class 1 is exp(f) / (1 + exp(f)). This translates to following check in the sklearn source:

    scores = self.decision_function(X)
    if len(scores.shape) == 1:
        indices = (scores > 0).astype(np.int)
    else:
        indices = scores.argmax(axis=1)
    return self.classes_[indices]

If you observe this check, it tells you that if decision function is greater than zero, then predict class 1, otherwise predict class 0 - a classical logit approach.

So, you will have to turn the decision function into something like:

d = clf.decision_function(x)[0]
probs = numpy.exp(d) / (1 + numpy.exp(d))

And then take appropriate zip etc.

like image 37
Sudeep Juvekar Avatar answered Oct 16 '22 12:10

Sudeep Juvekar