I'm trying to predict solve a multiclass classification using the xgboost algorithm, however i do not know how does <code>predict_proba</code> works exactly. In fact, <code>predict_proba</code> generates a list of probabilities but i don't know to which class each probability is related. Here is a simple example: This my train data: <pre class="prettyprint"><code>+------------+----------+-------+ | feature1 | feature2 | label | +------------+----------+-------+ | x | z | 3 | +------------+----------+-------+ | y | u | 0 | +------------+----------+-------+ | x | u | 2 | +------------+----------+-------+ </code></pre> Then when I try to predict probas for a new example <pre class="prettyprint"><code>model.predict_proba(['x','u']) </code></pre> This will return something like this: <pre class="prettyprint"><code>[0.2, 0.3, 0.5] </code></pre> My question is : what is the class that has the probability of 0.5 ? is it the class 2, or 3 or 0 ?

It seems that you use the sklearn API of xgboost. In this case the model has a dedicated attribute <code>model.classes_</code> that returns the classes that were learned by the model and the order of classes in the output array corresponds to the order of probabilities. Here is an example with dummy data: <pre class="prettyprint lang-py prettyprint-override"><code>import numpy as np import pandas as pd import xgboost as xgb # generate dummy data (10k examples, 10 numeric features, 4 classes of target) np.random.seed(312) train_X = np.random.random((10000,10)) train_y_mcc = np.random.randint(0, 4, train_X.shape[0]) #four classes:0,1,2,3 # model xgb_model_mpg = xgb.XGBClassifier(max_depth= 3, n_estimators=100) xgb_model_mpg.fit(train_X, train_y_mcc) # classes print(xgb_model_mpg.classes_) >>> [0 1 2 3] </code></pre>

xgboost predict_proba : How to do the mapping between the probabilities and the labels

Tags:

python

machine-learning

xgboost

I'm trying to predict solve a multiclass classification using the xgboost algorithm, however i do not know how does predict_proba works exactly. In fact, predict_proba generates a list of probabilities but i don't know to which class each probability is related.

Here is a simple example:

This my train data:

+------------+----------+-------+
| feature1   | feature2 | label |
+------------+----------+-------+
|    x       |    z     |   3   |
+------------+----------+-------+
|    y       |    u     |   0   |
+------------+----------+-------+
|    x       |    u     |   2   |
+------------+----------+-------+

Then when I try to predict probas for a new example

model.predict_proba(['x','u'])

This will return something like this:

[0.2, 0.3, 0.5]

My question is : what is the class that has the probability of 0.5 ? is it the class 2, or 3 or 0 ?

422

asked Mar 29 '19 15:03

ABK

1 Answers

It seems that you use the sklearn API of xgboost. In this case the model has a dedicated attribute model.classes_ that returns the classes that were learned by the model and the order of classes in the output array corresponds to the order of probabilities.

Here is an example with dummy data:

import numpy as np
import pandas as pd
import xgboost as xgb

# generate dummy data (10k examples, 10 numeric features, 4 classes of target)
np.random.seed(312)
train_X = np.random.random((10000,10))
train_y_mcc = np.random.randint(0, 4, train_X.shape[0]) #four classes:0,1,2,3

# model
xgb_model_mpg = xgb.XGBClassifier(max_depth= 3, n_estimators=100)
xgb_model_mpg.fit(train_X, train_y_mcc)

# classes
print(xgb_model_mpg.classes_)
>>> [0 1 2 3]

123

answered Sep 28 '22 18:09

Mischa Lisovyi

Related questions
                            
                                Activating conda environment in bash script that runs on startup
                            
                                how to implement Django-Private-Chat in Django 2
                            
                                Why does pip skip bracket package in requirements.txt?
                            
                                Invalid format string at _generate_jwt_token
                            
                                python3 -m venv: how to specify Python point release/version?
                            
                                Splitting a string on non digits
                            
                                How to return two values in cython cdef without gil (nogil)
                            
                                Is there a way to merge a list of dicts into a single dict using glom library? [duplicate]
                            
                                What's a good way to detect when a Python program exits or crashes?
                            
                                Test if an integer is an index value of a slice
                            
                                Viewing Graph from saved .pbtxt file on Tensorboard
                            
                                math operations between column in multiindex dataframe
                            
                                Get a new dataframe with difference of every two rows in Pandas
                            
                                Counting Peaks in a Time Series
                            
                                Finding the index of minimum value in a row in Python array
                            
                                Pyspark KMeans clustering features column IllegalArgumentException
                            
                                What is the equivalent to Bash's exec $@ in Python?
                            
                                Declaring length of tuples in Python typing
                            
                                String Join treats True as Boolean rather than string
                            
                                Web scraping google flight prices

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With