I would like to add oversampling procedure, like SMOTE oversampling, to scikit's Pipeline. But the transformers only supports fit
and transform
method, and do not provide a way to increase the number of samples and targets.
One possible way to do this is to break the pipeline to two separate pipelines connected by SMOTE sampling.
Is there any better solutions?
In the above code snippet, we've used SMOTE as a part of a pipeline. This pipeline is not a 'Scikit-Learn' pipeline, but 'imblearn' pipeline. Since, SMOTE doesn't have a 'fit_transform' method, we cannot use it with 'Scikit-Learn' pipeline.
Random oversampling involves randomly selecting examples from the minority class, with replacement, and adding them to the training dataset. Random undersampling involves randomly selecting examples from the majority class and deleting them from the training dataset.
Imbalanced data is a problem when creating a predictive machine learning model. One way to alleviate this problem is by oversampling the minority data. Instead of oversampling by replicating the data, we can oversample the data by creating synthetic data using the SMOTE technique.
Our current Pipeline
does not support changing the number of samples between steps as the Transformer.transform
method does not return the y
argument that would need to also be resampled. This is a know limitation of the current design. It might be fixed in a future version but we have not started to work on that yet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With