scikit-learn: transformer to select columns by name

Question

Context

I am working with scikit-learn and seeking for a transformer that allows me to simply select which columns to keep or which columns to drop.

Problem

In practice, I would like to include in my pipeline an additional transformer step that allows me to choose which columns to keep or which to drop. I am aware that in below example I could simply use the remainder but that would not work in my real implementation where I need to parametrize column selection in order to easily apply it to both train, test and eventually scoring.

Example

import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn import preprocessing
prep_pipeline = ColumnTransformer(transformers=[("std_num", preprocessing.StandardScaler(), ["a", "b"])],
                                  remainder = "passthrough")
X = pd.DataFrame([[0., 1., 2., 2.],
              [1., 1., 0., 1.]])
X.columns = ["a", "b", "c", "d"]
prep_pipeline.fit_transform(X)

Expected solution

The solution I need pipe an additional transformer step which role is exclusively to selected column ["a", "d"] therefore the expected output is:

array([[-1.,  1.],
       [ 1., -1.]])

hatef alipoor · Accepted Answer

I think you should use Pipeline of sklearn and following class in that Pipeline(current StandardScaler not support scaling parts of data frame)

import pandas as pd
from sklearn.base import BaseEstimator, TransformerMixin

class DropSomeColumns(BaseEstimator, TransformerMixin):

    def __init__(self, cols):
        if not isinstance(cols, list):
            self.cols = [cols]
        else:
            self.cols = cols

    def fit(self, X: pd.DataFrame, y: pd.Series):
        # there is nothing to fit
        return self

    def transform(self, X:pd.DataFrame):
        X = X.copy()
        return X[self.cols]

scikit-learn: transformer to select columns by name

Tags:

python

scikit-learn

Context

Problem

Example

Expected solution

Seymour

1 Answers

hatef alipoor

Recent Activity

Donate For Us

scikit-learn: transformer to select columns by name

Tags:

python

scikit-learn

Context

Problem

Example

Expected solution

Seymour

1 Answers

hatef alipoor

Related questions

Recent Activity

Donate For Us