When using XGBoost
we need to convert categorical variables into numeric.
Would there be any difference in performance/evaluation metrics between the methods of:
ALSO:
Would there be any reasons not to go with method 2 by using for example labelencoder
?
"When using XGBoost we need to convert categorical variables into numeric." Not always, no. If booster=='gbtree' (the default), then XGBoost can handle categorical variables encoded as numeric directly, without needing dummifying/one-hotting.
Xgboost with label encoding for categorical variablesLabel encoding is used to transform categorical values into numerical values. Split data into training data set and test data set. Tune xgboost hyper-parameters. Train xgboost model with train data set.
This means that if your data contains categorical data, you must encode it to numbers before you can fit and evaluate a model. The two most popular techniques are an integer encoding and a one hot encoding, although a newer technique called learned embedding may provide a useful middle ground between these two methods.
As far as XGBoost is concerned, one-hot-encoding becomes necessary as XGBoost accepts only numeric features.
xgboost
only deals with numeric columns.
if you have a feature [a,b,b,c]
which describes a categorical variable (i.e. no numeric relationship)
Using LabelEncoder you will simply have this:
array([0, 1, 1, 2])
Xgboost
will wrongly interpret this feature as having a numeric relationship! This just maps each string ('a','b','c')
to an integer, nothing more.
Proper way
Using OneHotEncoder you will eventually get to this:
array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 1., 0.], [ 0., 0., 1.]])
This is the proper representation of a categorical variable for xgboost
or any other machine learning tool.
Pandas get_dummies is a nice tool for creating dummy variables (which is easier to use, in my opinion).
Method #2 in above question will not represent the data properly
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With