What exactly does the LogisticRegression.predict_proba
function return?
In my example I get a result like this:
[[ 4.65761066e-03 9.95342389e-01] [ 9.75851270e-01 2.41487300e-02] [ 9.99983374e-01 1.66258341e-05]]
From other calculations, using the sigmoid function, I know, that the second column are probabilities. The documentation says, that the first column are n_samples
, but that can't be, because my samples are reviews, which are texts and not numbers. The documentation also says, that the second column are n_classes
. That certainly can't be, since I only have two classes (namely +1
and -1
) and the function is supposed to be about calculating probabilities of samples really being of a class, but not the classes themselves.
What is the first column really and why it is there?
The predict_proba() method The method accepts a single argument that corresponds to the data over which the probabilities will be computed and returns an array of lists containing the class probabilities for the input data points.
predict_proba gives you the probabilities for the target (0 and 1 in your case) in array form. The number of probabilities for each row is equal to the number of categories in target variable (2 in your case).
Logistic regression, despite its name, is a classification algorithm rather than regression algorithm. Based on a given set of independent variables, it is used to estimate discrete value (0 or 1, yes/no, true/false). It is also called logit or MaxEnt Classifier.
Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.
4.65761066e-03 + 9.95342389e-01 = 1 9.75851270e-01 + 2.41487300e-02 = 1 9.99983374e-01 + 1.66258341e-05 = 1
The first column is the probability that the entry has the -1
label and the second column is the probability that the entry has the +1
label. Note that classes are ordered as they are in self.classes_.
If you would like to get the predicted probabilities for the positive label only, you can use logistic_model.predict_proba(data)[:,1]
. This will yield you the [9.95342389e-01, 2.41487300e-02, 1.66258341e-05]
result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With