Using Smote with Gridsearchcv in Scikit-learn

Tags:

I'm dealing with an imbalanced dataset and want to do a grid search to tune my model's parameters using scikit's gridsearchcv. To oversample the data, I want to use SMOTE, and I know I can include that as a stage of a pipeline and pass it to gridsearchcv. My concern is that I think smote will be applied to both train and validation folds, which is not what you are supposed to do. The validation set should not be oversampled. Am I right that the whole pipeline will be applied to both dataset splits? And if yes, how can I turn around this? Thanks a lot in advance

213

asked May 09 '18 04:05

Ehsan M

1 Answers

Yes, it can be done, but with imblearn Pipeline.

You see, imblearn has its own Pipeline to handle the samplers correctly. I described this in a similar question here.

When called predict() on a imblearn.Pipeline object, it will skip the sampling method and leave the data as it is to be passed to next transformer. You can confirm that by looking at the source code here:

        if hasattr(transform, "fit_sample"):             pass         else:             Xt = transform.transform(Xt)

So for this to work correctly, you need the following:

from imblearn.pipeline import Pipeline model = Pipeline([         ('sampling', SMOTE()),         ('classification', LogisticRegression())     ])  grid = GridSearchCV(model, params, ...) grid.fit(X, y)

Fill the details as necessary, and the pipeline will take care of the rest.

187

answered Sep 22 '22 21:09

Vivek Kumar

Related questions
                            
                                How to get a classifier's confidence score for a prediction in sklearn?
                            
                                What is a good heuristic to detect if a column in a pandas.DataFrame is categorical?
                            
                                How do I install packages in PyCharm for all projects?
                            
                                Printing all the contents of a tensor
                            
                                Flask APP - ValueError: signal only works in main thread
                            
                                Force type conversion in python dataclass __init__ method
                            
                                Django Admin's "view on site" points to example.com instead of my domain
                            
                                numpy array of objects
                            
                                Most elegant way to modify elements of nested lists in place
                            
                                Combining Devanagari characters
                            
                                Parent instance is not bound to a Session; lazy load operation of attribute ’account’ cannot proceed
                            
                                Display python unittest results in nice, tabular form [closed]
                            
                                ImportError: No module named jinja2
                            
                                Why is the range object "not an iterator"? [duplicate]
                            
                                A faster alternative to Pandas `isin` function
                            
                                QLayout: Attempting to add QLayout "" to QWidget "", which already has a layout
                            
                                copy data from csv to postgresql using python
                            
                                Choosing from different cost function and activation function of a neural network
                            
                                How to use numpy in optional typing
                            
                                What does 'index 0 is out of bounds for axis 0 with size 0' mean?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Smote with Gridsearchcv in Scikit-learn

Tags:

python

machine-learning

scikit-learn

grid-search

oversampling

Ehsan M

People also ask

1 Answers

Vivek Kumar

Recent Activity

Donate For Us