I am trying to run XGBoost on a mix of categorical and numeric data. While I am able to train the model and predict, I am unable to dump the model output to a df or json. Instead I get the error:
Check failed: is_categorical: A in feature map is numerical but tree node is categorical.
I have no issues if only using numeric data and same problem occurs if using numpy and setting feature_type directly.
The toy example below shows the error I am receiving. Any suggestions on what I am doing wrong would be appreciated!
import numpy as np
import pandas as pd
import xgboost as xgb
X1 = [0, 2, 3, 1, 4, 5]
X2 = np.random.rand(6)
X3 = np.random.rand(6)*2
X = pd.DataFrame(data=np.column_stack((X1, X2, X3)), columns=['A', 'B', 'C'])
X['A'] = X['A'].astype('category')
X[['B', 'C']] = X[['A', 'B']].astype('float')
y = pd.DataFrame(data=np.random.rand(6)).astype('float')
XGBParams = {'booster': 'gbtree'}
d = xgb.DMatrix(X, label=y, missing=np.NaN, enable_categorical=True)
model = xgb.train(XGBParams, d, num_boost_round=20, verbose_eval=True)
print(model.trees_to_dataframe())
Answering my own question:
{'booster': 'gbtree', 'tree_method': 'hist'}With these two changes, code above runs fine as does my actual code, with categoricals used in trees.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With