Pre train a model (classifier) in scikit learn

Tags:

I would like to pre-train a model and then train it with another model.

I have model Decision Tree Classifer and then I would like to train it further with model LGBM Classifier. Is there a possibility to do this in scikit learn? I have already read this post about it https://datascience.stackexchange.com/questions/28512/train-new-data-to-pre-trained-model.. In the post it says

As per the official documentation, calling fit() more than once will overwrite what was learned by any previous fit()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) 

# Train Decision Tree Classifer
clf = DecisionTreeClassifier()
clf = clf.fit(X_train,y_train)

lgbm = lgb.LGBMClassifier()
lgbm = lgbm.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = lgbm.predict(X_test)

432

asked Nov 28 '21 17:11

Test

3 Answers

Perhaps you are looking for stacked classifiers.

In this approach, the predictions of earlier models are available as features for later models.

Look into StackingClassifiers.

Adapted from the documentation:

from sklearn.ensemble import StackingClassifier

estimators = [
     ('dtc_model', DecisionTreeClassifier()),
 ]

clf = StackingClassifier(
                estimators=estimators, 
                final_estimator=LGBMClassifier()
      )

answered Oct 29 '22 07:10

MYK

Unfortunately this is not possible at present. According to the doc at https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html?highlight=init_model, you can continue training the model if the model is from lightgbm.

I did try this setup with:

# dtc
dtc_model = DecisionTreeClassifier()
dtc_model = dtc_model.fit(X_train, y_train)
    
# save
dtc_fn = 'dtc.pickle.db'
pickle.dump(dtc_model, open(dtc_fn, 'wb'))
    
# lgbm
lgbm_model = LGBMClassifier()
lgbm_model.fit(X_train_2, y_train_2, init_model=dtc_fn)

And I get:

LightGBMError: Unknown model format or submodel type in model file dtc.pickle.db

answered Oct 29 '22 05:10

ferdy

As @Ferdy explained in his post, there is no simple way to perform this operation and it is understandable.

Scikit-learn DecisionTreeClassifier takes only numerical features and cannot handle nan values whereas LGBMClassifier can handle those.

By looking at the decision function of scikit-learn you can see that all it can perform is splits based on feature <= threshold.

On the contrary LGBM can perform the following:

feature is na
feature <= threshold
feature in categories

Splits in decision tree are selected at each step as they best splits the set of items. They try to minimize the node impurity (giny) or entropy.

The risk of further training a DecisionTreeClassifier is that you are not sure that splits performed in the original tree are the best, since you have new splits capabilities with LGBM that might/should lead in better performance.

I would recommend you to retrain the model with LGBMClassifier only as it might be possible that splits will be different from the original scikit-learn Tree.

answered Oct 29 '22 06:10

Antoine Dubuis

Related questions
                            
                                Bug in templated conversion operator in GCC: Workaround?
                            
                                Interpreting C++20 standard description of ITER_TRAITS
                            
                                Failed to initialize watch plugin "node_modules/jest-watch-typeahead/filename.js":
                            
                                Returning data which contains data owned by function
                            
                                How to create substrings efficiently
                            
                                log4j temporary fix in elasticsearch helm chart using Dlog4j2.formatMsgNoLookups
                            
                                How to avoid Log4J exploit? [closed]
                            
                                Vue: Best approach to access possible existing nested properties in error object
                            
                                Hot to figure out which factor level has been mapped to which fill color on a barplot in R?
                            
                                404 error while adding lambda trigger in cognito user pool
                            
                                THREE.JS & Reality Capture - Rotation issue photogrammetry reference camera's in a 3D space
                            
                                How to create a company specific parent dependency in gradle

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pre train a model (classifier) in scikit learn

Tags:

python

classification

model

scikit-learn

pre-trained-model

Test

People also ask

3 Answers

MYK

ferdy

Antoine Dubuis

Recent Activity

Donate For Us