Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn return value of LogisticRegression.predict_proba

What exactly does the LogisticRegression.predict_proba function return?

In my example I get a result like this:

[[  4.65761066e-03   9.95342389e-01]  [  9.75851270e-01   2.41487300e-02]  [  9.99983374e-01   1.66258341e-05]] 

From other calculations, using the sigmoid function, I know, that the second column are probabilities. The documentation says, that the first column are n_samples, but that can't be, because my samples are reviews, which are texts and not numbers. The documentation also says, that the second column are n_classes. That certainly can't be, since I only have two classes (namely +1 and -1) and the function is supposed to be about calculating probabilities of samples really being of a class, but not the classes themselves.

What is the first column really and why it is there?

like image 223
Zelphir Kaltstahl Avatar asked Apr 17 '16 19:04

Zelphir Kaltstahl


People also ask

What does predict_proba return?

The predict_proba() method The method accepts a single argument that corresponds to the data over which the probabilities will be computed and returns an array of lists containing the class probabilities for the input data points.

What does model predict_proba () do in sklearn?

predict_proba gives you the probabilities for the target (0 and 1 in your case) in array form. The number of probabilities for each row is equal to the number of categories in target variable (2 in your case).

What is Logisticregression in sklearn?

Logistic regression, despite its name, is a classification algorithm rather than regression algorithm. Based on a given set of independent variables, it is used to estimate discrete value (0 or 1, yes/no, true/false). It is also called logit or MaxEnt Classifier.

Does logistic regression return probability?

Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.


1 Answers

4.65761066e-03 + 9.95342389e-01 = 1 9.75851270e-01 + 2.41487300e-02 = 1 9.99983374e-01 + 1.66258341e-05 = 1 

The first column is the probability that the entry has the -1 label and the second column is the probability that the entry has the +1 label. Note that classes are ordered as they are in self.classes_.

If you would like to get the predicted probabilities for the positive label only, you can use logistic_model.predict_proba(data)[:,1]. This will yield you the [9.95342389e-01, 2.41487300e-02, 1.66258341e-05] result.

like image 120
iulian Avatar answered Sep 21 '22 13:09

iulian