Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get probabilities for SGDClassifier (LinearSVM)

I'm using SGDClassifier with loss function = "hinge". But hinge loss does not support probability estimates for class labels.

I need probabilities for calculating roc_curve. How can I get probabilities for hinge loss in SGDClassifier without using SVC from svm?

I've seen people mention about using CalibratedClassifierCV to get the probabilities but I've never used it and I don't know how it works.

I really appreciate the help. Thanks

like image 226
user_6396 Avatar asked Sep 17 '25 08:09

user_6396


1 Answers

In the strict sense, that's not possible.

Support vector machine classifiers are non-probabilistic: they use a hyperplane (a line in 2D, a plane in 3D and so on) to separate points into one of two classes. Points are only defined by which side of the hyperplane they are on., which forms the prediction directly.

This is in contrast with probabilistic classifiers like logistic regression and decision trees, which generate a probability for every point that is then converted to a prediction.

CalibratedClassifierCV is a sort of meta-estimator; to use it, you simply pass your instance of a base estimator to its constructor, so this will work:

base_model = SGDClassifier()
model = CalibratedClassifierCV(base_model)

model.fit(X, y)
model.predict_proba(X)

What it does is perform internal cross-validation to create a probability estimate. Note that this is equivalent to what sklearn.SVM.SVC does anyway.

like image 126
gmds Avatar answered Sep 20 '25 00:09

gmds