Suppose I have created a model, and my target variable is either 0
, 1
or 2
. It seems that if I use predict
, the answer is either of 0, or 1 or 2. But if I use predict_proba
, I get a row with 3 cols for each row as follows, for example
model = ... Classifier # It could be any classifier
m1 = model.predict(mytest)
m2= model.predict_proba(mytest)
# Now suppose m1[3] = [0.6, 0.2, 0.2]
Suppose I use both predict and predict_proba
. If in index 3, I get the above result with the result of predict_proba
, in index 3 of the result of predict I should see 0. Is this the case? I am trying to understand how using both predict
and predict_proba
on the same model relate to each other.
predict()
is used to predict the actual class (in your case one of 0
, 1
, or 2
).predict_proba()
is used to predict the class probabilities
From the example output that you shared,
predict()
would output class 0
since the class probability for 0
is 0.6.[0.6, 0.2, 0.2]
is the output of predict_proba
that simply denotes that the class probability for classes 0
, 1
, and 2
are 0.6
, 0.2
, and 0.2
respectively.Now as the documentation mentions for predict_proba
, the resulting array is ordered based on the labels you've been using:
The returned estimates for all classes are ordered by the label of classes.
Therefore, in your case where your class labels are [0, 1, 2]
, the corresponding output of predict_proba
will contain the corresponding probabilities. 0.6
is the probability of the instance to be classified as 0
and 0.2
are the probabilities that the instance is categorised as 1
and 2
respectively.
For a more comprehensive explanation, refer to the article What is the difference between predict() and predict_proba() in scikit-learn on TDS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With