What exactly does the <code>LogisticRegression.predict_proba</code> function return? In my example I get a result like this: <pre class="prettyprint"><code>[[ 4.65761066e-03 9.95342389e-01] [ 9.75851270e-01 2.41487300e-02] [ 9.99983374e-01 1.66258341e-05]] </code></pre> From other calculations, using the sigmoid function, I know, that the second column are probabilities. The documentation says, that the first column are <code>n_samples</code>, but that can't be, because my samples are reviews, which are texts and not numbers. The documentation also says, that the second column are <code>n_classes</code>. That certainly can't be, since I only have two classes (namely <code>+1</code> and <code>-1</code>) and the function is supposed to be about calculating probabilities of samples really being of a class, but not the classes themselves. What is the first column really and why it is there?

<pre class="prettyprint"><code>4.65761066e-03 + 9.95342389e-01 = 1 9.75851270e-01 + 2.41487300e-02 = 1 9.99983374e-01 + 1.66258341e-05 = 1 </code></pre> The first column is the probability that the entry has the <code>-1</code> label and the second column is the probability that the entry has the <code>+1</code> label. Note that classes are ordered as they are in self.classes_. If you would like to get the predicted probabilities for the positive label only, you can use <code>logistic_model.predict_proba(data)[:,1]</code>. This will yield you the <code>[9.95342389e-01, 2.41487300e-02, 1.66258341e-05]</code> result.

scikit-learn return value of LogisticRegression.predict_proba

Tags:

python

machine-learning

probability

scikit-learn

logistic-regression

What exactly does the LogisticRegression.predict_proba function return?

In my example I get a result like this:

[[  4.65761066e-03   9.95342389e-01]  [  9.75851270e-01   2.41487300e-02]  [  9.99983374e-01   1.66258341e-05]]

From other calculations, using the sigmoid function, I know, that the second column are probabilities. The documentation says, that the first column are n_samples, but that can't be, because my samples are reviews, which are texts and not numbers. The documentation also says, that the second column are n_classes. That certainly can't be, since I only have two classes (namely +1 and -1) and the function is supposed to be about calculating probabilities of samples really being of a class, but not the classes themselves.

What is the first column really and why it is there?

223

asked Apr 17 '16 19:04

Zelphir Kaltstahl

1 Answers

4.65761066e-03 + 9.95342389e-01 = 1 9.75851270e-01 + 2.41487300e-02 = 1 9.99983374e-01 + 1.66258341e-05 = 1

The first column is the probability that the entry has the -1 label and the second column is the probability that the entry has the +1 label. Note that classes are ordered as they are in self.classes_.

If you would like to get the predicted probabilities for the positive label only, you can use logistic_model.predict_proba(data)[:,1]. This will yield you the [9.95342389e-01, 2.41487300e-02, 1.66258341e-05] result.

120

answered Sep 21 '22 13:09

iulian

Related questions
                            
                                Python equivalent of golang's defer statement
                            
                                TensorFlow 'module' object has no attribute 'global_variables_initializer'
                            
                                Regular expression parsing a binary file?
                            
                                How to get the caller class name inside a function of another class in python?
                            
                                Convert set to string and vice versa
                            
                                Read stdin as binary [duplicate]
                            
                                Pyspark: show histogram of a data frame column
                            
                                After installing anaconda - command not found: jupyter
                            
                                Determining the most contributing features for SVM classifier in sklearn
                            
                                Illegal instruction (core dumped) after running import tensorflow
                            
                                Python module to shellquote/unshellquote? [duplicate]
                            
                                You are not allowed to edit '...' package information
                            
                                Print a float number in normal form, not exponential form / scientific notation [duplicate]
                            
                                How to configure Logging in Python
                            
                                Saving Image with PIL
                            
                                Python writelines() and write() huge time difference
                            
                                Running subprocess within different virtualenv with python
                            
                                passing data to subprocess.check_output
                            
                                matplotlib savefig() size control
                            
                                How to install sklearn? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With