How to add oversampling/undersampling procedure in scikit's Pipeline?

Tags:

I would like to add oversampling procedure, like SMOTE oversampling, to scikit's Pipeline. But the transformers only supports fit and transform method, and do not provide a way to increase the number of samples and targets.

One possible way to do this is to break the pipeline to two separate pipelines connected by SMOTE sampling.

Is there any better solutions?

760

asked Mar 29 '15 14:03

Shuai Zhang

1 Answers

Our current Pipeline does not support changing the number of samples between steps as the Transformer.transform method does not return the y argument that would need to also be resampled. This is a know limitation of the current design. It might be fixed in a future version but we have not started to work on that yet.

122

answered Nov 15 '22 08:11

ogrisel

Related questions
                            
                                SQLAlchemy how to define two models for the same table
                            
                                os.remove() in windows gives "[Error 32] being used by another process"
                            
                                Sending Keys Using Splinter
                            
                                Python save matplotlib figure with exact pixel size
                            
                                Running TextBlob in Python3
                            
                                Removal of an item from a python list, how are items compared (e.g. numpy arrays)?
                            
                                Python multiprocessing daemon vs non-daemon vs main
                            
                                'QThread: Destroyed while thread is still running' on quit
                            
                                Multiindex pandas groupby + aggregate, keep full index
                            
                                How to handle double quotes inside field values with csv module?
                            
                                How to localize Python's argparse module, without patching it?
                            
                                Can celery celerybeat use a Database Scheduler without Django?
                            
                                reading worksheet and preserving conditional formatting
                            
                                PyQt5 QTextEdit auto completion
                            
                                Flattening an array in pandas
                            
                                How can I get pycharm to NOT auto-insert a closing docstring?
                            
                                DRF TypeError 'type' object is not iterable
                            
                                Sentiment analysis of non-English texts
                            
                                How do I upgrade python 2.7.8 to 2.7.9 in Anaconda without conflicting other components in its environment?
                            
                                Python - from . import

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to add oversampling/undersampling procedure in scikit's Pipeline?

Tags:

python

scikit-learn

Shuai Zhang

People also ask

1 Answers

ogrisel

Recent Activity

Donate For Us