Can I add outlier detection and removal to Scikit learn Pipeline?

1 Answers

Yes. Subclass the TransformerMixin and build a custom transformer. Here is an extension to one of the existing outlier detection methods:

from sklearn.pipeline import Pipeline, TransformerMixin
from sklearn.neighbors import LocalOutlierFactor

class OutlierExtractor(TransformerMixin):
    def __init__(self, **kwargs):
        """
        Create a transformer to remove outliers. A threshold is set for selection
        criteria, and further arguments are passed to the LocalOutlierFactor class

        Keyword Args:
            neg_conf_val (float): The threshold for excluding samples with a lower
               negative outlier factor.

        Returns:
            object: to be used as a transformer method as part of Pipeline()
        """

        self.threshold = kwargs.pop('neg_conf_val', -10.0)

        self.kwargs = kwargs

    def transform(self, X, y):
        """
        Uses LocalOutlierFactor class to subselect data based on some threshold

        Returns:
            ndarray: subsampled data

        Notes:
            X should be of shape (n_samples, n_features)
        """
        X = np.asarray(X)
        y = np.asarray(y)
        lcf = LocalOutlierFactor(**self.kwargs)
        lcf.fit(X)
        return (X[lcf.negative_outlier_factor_ > self.threshold, :],
                y[lcf.negative_outlier_factor_ > self.threshold])

    def fit(self, *args, **kwargs):
        return self

Then create a pipeline as:

pipe = Pipeline([('outliers', OutlierExtraction()), ...])

132

answered Oct 22 '22 10:10

Attack68

Related questions
                            
                                Plotly+Python: How to plot arrows in 3D?
                            
                                Mean average precision (mAP) in tensorflow
                            
                                Python decorators vs passing functions
                            
                                python plotly create a color scale related to max and min number of value
                            
                                Why are some numpy datatypes JSON serializable and others not?
                            
                                How to write a GRPC python unittest
                            
                                Python fractal box count - fractal dimension
                            
                                what does endpoint mean in flask-restful
                            
                                Is it possible to skip delegating a celery task if the params and the task name is already queued in the server?
                            
                                pandas assert_frame_equal behavior
                            
                                Celery + SQS - pycurl error
                            
                                Django - Adding password validations in a ModelForm
                            
                                Tensorflow minimise with respect to only some elements of a variable
                            
                                Installing Graphviz for use with Python 3 on Windows 10
                            
                                How to do a random stratified sampling with Python (Not a train/test split)?
                            
                                Include submodules on click
                            
                                Protocol error, got "H" as reply type byte
                            
                                Altering traceback of a non-callable module
                            
                                Connect the nearest points in segment and label segment
                            
                                'Pip' recognized in Command Prompt but not in PyCharm terminal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can I add outlier detection and removal to Scikit learn Pipeline?

Tags:

python

scikit-learn

Attack68

People also ask

1 Answers

Attack68

Recent Activity

Donate For Us