Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Model dump for XGBoost with categorical variable

I am trying to run XGBoost on a mix of categorical and numeric data. While I am able to train the model and predict, I am unable to dump the model output to a df or json. Instead I get the error:

Check failed: is_categorical: A in feature map is numerical but tree node is categorical.

I have no issues if only using numeric data and same problem occurs if using numpy and setting feature_type directly.

The toy example below shows the error I am receiving. Any suggestions on what I am doing wrong would be appreciated!

import numpy as np
import pandas as pd
import xgboost as xgb

X1 = [0, 2, 3, 1, 4, 5]
X2 = np.random.rand(6)
X3 = np.random.rand(6)*2
X = pd.DataFrame(data=np.column_stack((X1, X2, X3)), columns=['A', 'B', 'C'])
X['A'] = X['A'].astype('category')
X[['B', 'C']] = X[['A', 'B']].astype('float')
y = pd.DataFrame(data=np.random.rand(6)).astype('float')

XGBParams = {'booster': 'gbtree'}
d = xgb.DMatrix(X, label=y, missing=np.NaN, enable_categorical=True)
model = xgb.train(XGBParams, d,  num_boost_round=20, verbose_eval=True)
print(model.trees_to_dataframe())
like image 846
CaptBarnacles Avatar asked Jun 05 '26 04:06

CaptBarnacles


1 Answers

Answering my own question:

  1. Updated to latest XGBoost (1.7.3) -- not sure if this was critical but good practice.
  2. Had to explicitly use hist for tree method --> XGBParams = {'booster': 'gbtree', 'tree_method': 'hist'}

With these two changes, code above runs fine as does my actual code, with categoricals used in trees.

like image 162
CaptBarnacles Avatar answered Jun 06 '26 18:06

CaptBarnacles