Sklearn pass fit() parameters to xgboost in pipeline

Question

Similar to How to pass a parameter to only one part of a pipeline object in scikit learn? I want to pass parameters to only one part of a pipeline. Usually, it should work fine like:

estimator = XGBClassifier()
pipeline = Pipeline([
        ('clf', estimator)
    ])

and executed like

pipeline.fit(X_train, y_train, clf__early_stopping_rounds=20)

but it fails with:

    /usr/local/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
        114         """
        115         Xt, yt, fit_params = self._pre_transform(X, y, **fit_params)
    --> 116         self.steps[-1][-1].fit(Xt, yt, **fit_params)
        117         return self
        118 

    /usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/sklearn.py in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose)
        443                               early_stopping_rounds=early_stopping_rounds,
        444                               evals_result=evals_result, obj=obj, feval=feval,
    --> 445                               verbose_eval=verbose)
        446 
        447         self.objective = xgb_options["objective"]

    /usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, learning_rates, xgb_model, callbacks)
        201                            evals=evals,
        202                            obj=obj, feval=feval,
    --> 203                            xgb_model=xgb_model, callbacks=callbacks)
        204 
        205 

    /usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
         97                                end_iteration=num_boost_round,
         98                                rank=rank,
    ---> 99                                evaluation_result_list=evaluation_result_list))
        100         except EarlyStopException:
        101             break

    /usr/local/lib/python3.5/site-packages/xgboost-0.6-py3.5.egg/xgboost/callback.py in callback(env)
        196     def callback(env):
        197         """internal function"""
    --> 198         score = env.evaluation_result_list[-1][1]
        199         if len(state) == 0:
        200             init(env)

    IndexError: list index out of range

Whereas a

estimator.fit(X_train, y_train, early_stopping_rounds=20)

works just fine.

Aashita Kesarwani · Accepted Answer

For the early stopping rounds, you must always specify the validation set given by the argument eval_set. Here is how the error in your code can be fixed.

pipeline.fit(X_train, y_train, clf__early_stopping_rounds=20, clf__eval_set=[(test_X, test_y)])

gdv820 · Answer

I recently used the following steps to use the eval metric and eval_set parameters for Xgboost.

1. create the pipeline with the pre-processing/feature transformation steps:

This was made from a pipeline defined earlier which includes the xgboost model as the last step.

pipeline_temp = pipeline.Pipeline(pipeline.cost_pipe.steps[:-1])

2. Fit this Pipeline

X_trans = pipeline_temp.fit_transform(X_train[FEATURES],y_train)

3. Create your eval_set by applying the transformations to the test set

eval_set = [(X_trans, y_train), (pipeline_temp.transform(X_test), y_test)]

4. Add your xgboost step back into the Pipeline

 pipeline_temp.steps.append(pipeline.cost_pipe.steps[-1])

5. Fit the new pipeline by passing the Parameters

pipeline_temp.fit(X_train[FEATURES], y_train,
             xgboost_model__eval_metric = ERROR_METRIC,
             xgboost_model__eval_set = eval_set)

6. Persist the Pipeline if you wish to.

joblib.dump(pipeline_temp, save_path)

Georg Heiler · Answer

This is the solution: https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13755/xgboost-early-stopping-and-other-issues both early_stooping_rounds and the watchlist / eval_set need to be passed. Unfortunately, this does not work for me, as the variables on the watchlist would require a preprocessing step which is only applied in the pipeline / I would need to apply this step manually.

Sklearn pass fit() parameters to xgboost in pipeline

Tags:

python

keyword-argument

scikit-learn

pipeline

xgboost

Georg Heiler

3 Answers

Aashita Kesarwani

1. create the pipeline with the pre-processing/feature transformation steps:

This was made from a pipeline defined earlier which includes the xgboost model as the last step.

2. Fit this Pipeline

3. Create your eval_set by applying the transformations to the test set

4. Add your xgboost step back into the Pipeline

5. Fit the new pipeline by passing the Parameters

6. Persist the Pipeline if you wish to.

gdv820

Georg Heiler

Recent Activity

Donate For Us

Sklearn pass fit() parameters to xgboost in pipeline

Tags:

python

keyword-argument

scikit-learn

pipeline

xgboost

Georg Heiler

3 Answers

Aashita Kesarwani

1. create the pipeline with the pre-processing/feature transformation steps:

This was made from a pipeline defined earlier which includes the xgboost model as the last step.

2. Fit this Pipeline

3. Create your eval_set by applying the transformations to the test set

4. Add your xgboost step back into the Pipeline

5. Fit the new pipeline by passing the Parameters

6. Persist the Pipeline if you wish to.

gdv820

Georg Heiler

Related questions

Recent Activity

Donate For Us