How to provide weighted eval set to XGBClassifier.fit()?

Question

From the sklearn-style API of XGBClassifier, we can provide eval examples for early-stopping.

eval_set (list, optional) – A list of (X, y) pairs to use as a validation set for early-stopping

However, the format only mentions a pair of features and labels. So if the doc is accurate, there is no place to provide weights for these eval examples.

Am I missing anything?

If it's not achievable in the sklearn-style, is it supported in the original (i.e. non-sklearn) XGBClassifier API? A short example will be nice, since I never used that version of the API.

user667489 · Accepted Answer

As of a few weeks ago, there is a new parameter for the fit method, sample_weight_eval_set, that allows you to do exactly this. It takes a list of weight variables, i.e. one per evaluation set. I don't think this feature has made it into a stable release yet, but it is available right now if you compile xgboost from source.

https://github.com/dmlc/xgboost/blob/b018ef104f0c24efaedfbc896986ad3ed1b66774/python-package/xgboost/sklearn.py#L235

Max Power · Answer

EDIT - UPDATED per conversation in comments

Given that you have a target-variable representing real-valued gain/loss values which you would like to classify as "gain" or "loss", and you would like to make sure the validation-set of the classifier weighs the large-absolute-value gains/losses heaviest, here are two possible approaches:

Create a custom classifier which is just XGBoostRegressor fed to a treshold where the real-valued regression predictions are converted to 1/0 or "gain"/"loss" classifications. The .fit() method of this classifier would just call .fit() of xgbregressor, while .predict() method of this classifier would call .predict() of the regressor and then return the thresholded category predictions.
you mentioned you would like to try weighting the treatment of the records in your validation set, but there is no option for this in xgboost. The way to implement this would be to implement a custom eval-metric. However, you pointed out that eval_metric must be able to return a score for a single label/pred record at a time, so it couldn't accept all your row-values and perform the weighting in the eval metric. The solution to this you mentioned in your comment was "create a callable which has a ref to all validation examples, pass the indices (instead of labels and scores) into eval_set, use the indices to fetch labels and scores from within the callable and return metric for each validation examples." This should also work.

I would tend to prefer option 1 as more straightforward, but trying two different approaches and comparing results is generally a good idea if you have the time, so interested how these turn out for you.

How to provide weighted eval set to XGBClassifier.fit()?

Tags:

scikit-learn

xgboost

Roy

2 Answers

user667489

Max Power

Recent Activity

Donate For Us

How to provide weighted eval set to XGBClassifier.fit()?

Tags:

scikit-learn

xgboost

Roy

2 Answers

user667489

Max Power

Related questions

Recent Activity

Donate For Us