How to optimize a sklearn pipeline, using XGboost, for a different `eval_metric`?

Tags:

I'm trying to use XGBoost, and optimize the eval_metric as auc(as described here).

This works fine when using the classifier directly, but fails when I'm trying to use it as a pipeline.

What is the correct way to pass a .fit argument to the sklearn pipeline?

Example:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from xgboost import XGBClassifier
import xgboost
import sklearn

print('sklearn version: %s' % sklearn.__version__)
print('xgboost version: %s' % xgboost.__version__)

X, y = load_iris(return_X_y=True)

# Without using the pipeline: 
xgb = XGBClassifier()
xgb.fit(X, y, eval_metric='auc')  # works fine

# Making a pipeline with this classifier and a scaler:
pipe = Pipeline([('scaler', StandardScaler()), ('classifier', XGBClassifier())])

# using the pipeline, but not optimizing for 'auc': 
pipe.fit(X, y)  # works fine

# however this does not work (even after correcting the underscores): 
pipe.fit(X, y, classifier__eval_metric='auc')  # fails

The error:
TypeError: before_fit() got an unexpected keyword argument 'classifier__eval_metric'

Regarding the version of xgboost:
xgboost.__version__ shows 0.6
pip3 freeze | grep xgboost shows xgboost==0.6a2.

586

asked Mar 14 '17 18:03

sapo_cosmico

2 Answers

The error is because you are using a single underscore between estimator name and its parameter when using in pipeline. It should be two underscores.

From the documentation of Pipeline.fit(), we see that the correct way of supplying params in fit:

Parameters passed to the fit method of each step, where each parameter name is prefixed such that parameter p for step s has key s__p.

So in your case, the correct usage is:

pipe.fit(X_train, y_train, classifier__eval_metric='auc')

(Notice two underscores between name and param)

195

answered Oct 19 '22 04:10

Vivek Kumar

When the goal is to optimize I suggest to use sklearn wrapper and GridSearchCV

from xgboost.sklearn import XGBClassifier
from sklearn.grid_search import GridSearchCV

It looks like

pipe = Pipeline([('scaler', StandardScaler()), ('classifier', XGBClassifier())])

score = 'roc_auc'
pipe.fit(X, y) 

param = {
 'classifier_max_depth':[1,2,3,4,5,6,7,8,9,10] # just as example
}

gsearch = GridSearchCV(estimator =pipe, param_grid =param , scoring= score)

Also you can use a technique of cross validation

gsearch.fit(X, y)

And you get the best params & the best scores

gsearch.best_params_, gsearch.best_score_

answered Oct 19 '22 02:10

Edward

Related questions
                            
                                Define a variable in sympy to be a CONSTANT
                            
                                Tuple assignment in Python, Is this a bug in Python? [duplicate]
                            
                                How to import modules from site-packages when in a different directory?
                            
                                Is it possible to def a function with a dotted name in Python?
                            
                                More efficient way to loop through PySpark DataFrame and create new columns
                            
                                How can I count each UDP packet sent out by subprocesses?
                            
                                BeautifulSoup with multiple tags, each tag with a specific class
                            
                                Python pandas dataframe: find max for each unique values of an another column
                            
                                ImportError: No module named 'setup'
                            
                                Attribute's predictive capacity for a particular target in Python, using feature selection in Sklearn
                            
                                Running Python scripts inside Android Studio [closed]
                            
                                mock xmlrpc.client method python
                            
                                How to pass arguments to pytest if pytest is run programmatically from another module?
                            
                                How to access numpy default global random number generator
                            
                                Pandas stacked area chart with zero values
                            
                                How to show the count values on the top of a bar in a countplot?
                            
                                How to use xml sax parser to read and write a large xml?
                            
                                Tweepy.cursor multiple / OR logic function for query terms
                            
                                Python, Bokeh: How to turn off auto-update of axes
                            
                                Declaring new variables inside class methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to optimize a sklearn pipeline, using XGboost, for a different `eval_metric`?

Tags:

python

classification

scikit-learn

pipeline

xgboost

sapo_cosmico

People also ask

2 Answers

Vivek Kumar

Edward

Recent Activity

Donate For Us