Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grid search with LightGBM example

I am trying to find the best parameters for a lightgbm model using GridSearchCV from sklearn.model_selection. I have not been able to find a solution that actually works.

I have managed to set up a partly working code:

import numpy as np
import pandas as pd
import lightgbm as lgb
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold

np.random.seed(1)

train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
y = pd.read_csv('y.csv')
y = y.values.ravel()
print(train.shape, test.shape, y.shape)

categoricals = ['COL_A','COL_B']
indexes_of_categories = [train.columns.get_loc(col) for col in categoricals]

gkf = KFold(n_splits=5, shuffle=True, random_state=42).split(X=train, y=y)

param_grid = {
    'num_leaves': [31, 127],
    'reg_alpha': [0.1, 0.5],
    'min_data_in_leaf': [30, 50, 100, 300, 400],
    'lambda_l1': [0, 1, 1.5],
    'lambda_l2': [0, 1]
    }

lgb_estimator = lgb.LGBMClassifier(boosting_type='gbdt',  objective='binary', num_boost_round=2000, learning_rate=0.01, metric='auc',categorical_feature=indexes_of_categories)

gsearch = GridSearchCV(estimator=lgb_estimator, param_grid=param_grid, cv=gkf)
lgb_model = gsearch.fit(X=train, y=y)

print(lgb_model.best_params_, lgb_model.best_score_)

This seems to be working but with a UserWarning:

categorical_feature keyword has been found in params and will be ignored. Please use categorical_feature argument of the Dataset constructor to pass this parameter.

I am looking for a working solution or perhaps a suggestion on how to ensure that lightgbm accepts categorical arguments in the above code

like image 821
bhaskarc Avatar asked Jun 04 '18 18:06

bhaskarc


People also ask

How do you use grid search?

Brief Overview of Grid Search 1 — Prepare the database. 2 —Identify the model's hyperparameters to optimize, and then we select the hyperparameter values that we want to test. 3 — Asses error score for each combination in the hyperparameter grid. 4 — Select the hyperparameter combination with the best error metric.

How do I use the grid search to tune hyperparameters?

Grid search is the simplest algorithm for hyperparameter tuning. Basically, we divide the domain of the hyperparameters into a discrete grid. Then, we try every combination of values of this grid, calculating some performance metrics using cross-validation.

Is cross-validation used in grid search?

Cross-Validation and GridSearchCVIn GridSearchCV, along with Grid Search, cross-validation is also performed. Cross-Validation is used while training the model. As we know that before training the model with data, we divide the data into two parts – train data and test data.

What is GridSearchCV Best_score_?

The grid. best_score_ is the average of all cv folds for a single combination of the parameters you specify in the tuned_params . In order to access other relevant details about the grid searching process, you can look at the grid.


2 Answers

As the warning states, categorical_feature is not one of the LGBMModel arguments. It is relevant in lgb.Dataset instantiation, which in the case of sklearn API is done directly in the fit() method see the doc. Thus, in order to pass those in the GridSearchCV optimisation one has to provide it as an argument of the GridSearchCV.fit() method in the case of sklearn v0.19.1 or as an additional fit_params argument in GridSearchCV instantiation in older sklearn versions

like image 85
Mischa Lisovyi Avatar answered Oct 11 '22 14:10

Mischa Lisovyi


In case you are struggling with how to pass the fit_params, which happened to me as well, this is how you should do that:

fit_params = {'categorical_feature':indexes_of_categories}
clf = GridSearchCV(model, param_grid, cv=n_folds)
clf.fit(x_train, y_train, **fit_params)
like image 25
saeedghadiri Avatar answered Oct 11 '22 13:10

saeedghadiri