Python - LightGBM with GridSearchCV, is running forever

Tags:

Recently, I am doing multiple experiments to compare Python XgBoost and LightGBM. It seems that this LightGBM is a new algorithm that people say it works better than XGBoost in both speed and accuracy.

This is LightGBM GitHub. This is LightGBM python API documents, here you will find python functions you can call. It can be directly called from LightGBM model and also can be called by LightGBM scikit-learn.

This is the XGBoost Python API I use. As you can see, it has very similar data structure as LightGBM python API above.

Here are what I tried:

If you use train() method in both XGBoost and LightGBM, yes lightGBM works faster and has higher accuracy. But this method, doesn't have cross validation.
If you try cv() method in both algorithms, it is for cross validation. However, I didn't find a way to use it return a set of optimum parameters.
if you try scikit-learn GridSearchCV() with LGBMClassifier and XGBClassifer. It works for XGBClassifer, but for LGBClassifier, it is running forever.

Here are my code examples when using GridSearchCV() with both classifiers:

XGBClassifier with GridSearchCV

param_set = {
 'n_estimators':[50, 100, 500, 1000]
}
gsearch = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, 
n_estimators=100, max_depth=5,
min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, 
nthread=7,
objective= 'binary:logistic', scale_pos_weight=1, seed=410), 
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)

xgb_model2 = gsearch.fit(features_train, label_train)
xgb_model2.grid_scores_, xgb_model2.best_params_, xgb_model2.best_score_

This works very well for XGBoost, and only tool a few seconds.

LightGBM with GridSearchCV

param_set = {
 'n_estimators':[20, 50]
}

gsearch = GridSearchCV(estimator = LGBMClassifier( boosting_type='gbdt', num_leaves=30, max_depth=5, learning_rate=0.1, n_estimators=50, max_bin=225, 
 subsample_for_bin=0.8, objective=None, min_split_gain=0, 
 min_child_weight=5, 
 min_child_samples=10, subsample=1, subsample_freq=1, 
colsample_bytree=1, 
reg_alpha=1, reg_lambda=0, seed=410, nthread=7, silent=True), 
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)

lgb_model2 = gsearch.fit(features_train, label_train)
lgb_model2.grid_scores_, lgb_model2.best_params_, lgb_model2.best_score_

However, by using this method for LightGBM, it has been running the whole morning today still nothing generated.

I am using the same dataset, a dataset contains 30000 records.

I have 2 questions:

If we just use cv() method, is there anyway to tune optimum set of parameters?
Do you know why GridSearchCV() does not work well with LightGBM? I'm wondering whether this only happens on me all it happened on others to?

250

asked Jul 11 '17 23:07

Cherry Wu

1 Answers

Try to use n_jobs = 1 and see if it works.

In general, if you use n_jobs = -1 or n_jobs > 1 then you should protect your script by using if __name__=='__main__': :

Simple Example:

import ...

if __name__=='__main__':

    data= pd.read_csv('Prior Decompo2.csv', header=None)
    X, y = data.iloc[0:, 0:26].values, data.iloc[0:,26].values
    param_grid = {'C' : [0.01, 0.1, 1, 10], 'kernel': ('rbf', 'linear')}
    classifier = SVC()
    grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, scoring='accuracy', n_jobs=-1, verbose=42)
    grid_search.fit(X,y)

Finally, can you try to run your code using n_jobs = -1 and including if __name__=='__main__': as I explained and see if it works?

answered Oct 02 '22 18:10

seralouk

Related questions
                            
                                Scrapy python csv output has blank lines between each row
                            
                                Adaptive Histogram Equalization in Python
                            
                                Numpy: how delete rows common to 2 matrices
                            
                                Add word embedding to word2vec gensim model
                            
                                Concatenating selected strings in list of strings
                            
                                How to swap index and values on pandas dataframe
                            
                                How to convert `lambda` object to `function` object for pickling in Python?
                            
                                Pre-process data with multiple instances against 1 label for neural network tensorflow
                            
                                k-means with a centroid constraint
                            
                                How is lookup table data declared and initialized in SQLAlchemy?
                            
                                Get audiences insights using Keras and TensorFlow
                            
                                Invalid character found in method name. HTTP method must be tokens
                            
                                Can't run django tests by tag
                            
                                Mocking a property call returning MagicMock, not value
                            
                                Fastest way to read .xlsx file with Python
                            
                                Redirect while passing message in django
                            
                                How to minimize repetition in tox file
                            
                                How to take elements along a given axis, given by their indices?
                            
                                Reading pdf files line by line using python
                            
                                Confused about tensor dimensions and batch sizes in pytorch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python - LightGBM with GridSearchCV, is running forever

Tags:

python

xgboost

cross-validation

grid-search

lightgbm

Cherry Wu

People also ask

1 Answers

seralouk

Recent Activity

Donate For Us