Suppose I have created a model, and my target variable is either 0, 1 or 2. It seems that if I use predict, the answer is either of 0, or 1 or 2. But if I use predict_proba, I get a row with 3 cols for each row as follows, for example
model = ... Classifier # It could be any classifier
m1 = model.predict(mytest)
m2= model.predict_proba(mytest)
# Now suppose m1[3] = [0.6, 0.2, 0.2]
Suppose I use both predict and predict_proba. If in index 3, I get the above result with the result of predict_proba, in index 3 of the result of predict I should see 0. Is this the case? I am trying to understand how using both predict and predict_proba on the same model relate to each other.
predict() is used to predict the actual class (in your case one of 0, 1, or 2).predict_proba() is used to predict the class probabilities
From the example output that you shared,
predict() would output class 0 since the class probability for 0 is 0.6.[0.6, 0.2, 0.2] is the output of predict_proba that simply denotes that the class probability for classes 0, 1, and 2 are 0.6, 0.2, and 0.2 respectively.Now as the documentation mentions for predict_proba, the resulting array is ordered based on the labels you've been using:
The returned estimates for all classes are ordered by the label of classes.
Therefore, in your case where your class labels are [0, 1, 2], the corresponding output of predict_proba will contain the corresponding probabilities. 0.6 is the probability of the instance to be classified as 0 and 0.2 are the probabilities that the instance is categorised as 1 and 2 respectively.
For a more comprehensive explanation, refer to the article What is the difference between predict() and predict_proba() in scikit-learn on TDS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With