I have some training pipeline that heavily uses XGBoost instead of scikit-learn, only because of the way XGBoost cleanly handles null values.
However, I'm tasked with introducing non-technical folks to machine learning, and thought it'd be good to take the idea of a single-tree classifier and talk about how XGBoost generally takes that data structure and "puts it on steroids." Specifically, I want to plot this single-tree classifier to show cutpoints.
Would specifying n_estimators=1
be roughly equivalent to using scikit's DecisionTreeClassifier
n_estimators — the number of runs XGBoost will try to learn. learning_rate — learning speed.
After each iteration (which adds an additional tree) xgboost calculates the new validation error. With that xgboost is able to detect when it starts to overfit (when the validation error starts to increase). This will give you the optimal number of trees for a given set of hyperparameters.
One of the most important differences between XG Boost and Random forest is that the XGBoost always gives more importance to functional space when reducing the cost of a model while Random Forest tries to give more preferences to hyperparameters to optimize the model.
Tuning Learning Rate and the Number of Trees in XGBoost The number of decision trees will be varied from 100 to 500 and the learning rate varied on a log10 scale from 0.0001 to 0.1. There are 5 variations of n_estimators and 4 variations of learning_rate.
import subprocess
import numpy as np
from xgboost import XGBClassifier, plot_tree
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn import metrics
import matplotlib.pyplot as plt
params = {
'max_depth': 5,
'min_samples_leaf': 5,
'random_state': RANDOM_STATE
X, y = make_classification(
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, random_state=RANDOM_STATE)
# __init__(self, max_depth=3, learning_rate=0.1,
# n_estimators=100, silent=True,
# objective='binary:logistic', booster='gbtree',
# n_jobs=1, nthread=None, gamma=0,
# min_child_weight=1, max_delta_step=0,
# subsample=1, colsample_bytree=1, colsample_bylevel=1,
# reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
# base_score=0.5, random_state=0, seed=None, missing=None, **kwargs)
xgb_model = XGBClassifier(
# __init__(self, criterion='gini',
# splitter='best', max_depth=None,
# min_samples_split=2, min_samples_leaf=1,
# min_weight_fraction_leaf=0.0, max_features=None,
# random_state=None, max_leaf_nodes=None,
# min_impurity_decrease=0.0, min_impurity_split=None,
# class_weight=None, presort=False)
sk_model = DecisionTreeClassifier(
xgb_model.fit(Xtrain, ytrain)
xgb_pred = xgb_model.predict(Xtest)
sk_model.fit(Xtrain, ytrain)
sk_pred = sk_model.predict(Xtest)
print(metrics.classification_report(ytest, xgb_pred))
print(metrics.classification_report(ytest, sk_pred))
plot_tree(xgb_model, rankdir='LR'); plt.show()
export_graphviz(sk_model, 'sk_model.dot'); subprocess.call('dot -Tpng sk_model.dot -o sk_model.png'.split())
Some performance metrics (I know, I didn't calibrate the classifiers totally)...
>>> print(metrics.classification_report(ytest, xgb_pred))
precision recall f1-score support
0 0.86 0.82 0.84 125036
1 0.83 0.87 0.85 124964
micro avg 0.85 0.85 0.85 250000
macro avg 0.85 0.85 0.85 250000
weighted avg 0.85 0.85 0.85 250000
>>> print(metrics.classification_report(ytest, sk_pred))
precision recall f1-score support
0 0.86 0.82 0.84 125036
1 0.83 0.87 0.85 124964
micro avg 0.85 0.85 0.85 250000
macro avg 0.85 0.85 0.85 250000
weighted avg 0.85 0.85 0.85 250000
And some pictures:
So, barring any investigate mistakes/overgeneralizations, an XGBClassifier
(and, I would assume, Regressor) with one estimator seems identical to a scikit-learn DecisionTreeClassifier
with the same shared parameters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With