How to select only few columns in scikit learn column selector pipeline?

Tags:

I was reading the scikitlearn tutorial about column transformer. The given example (https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html#sklearn.compose.make_column_selector) works, but when I tried to select only few columns, It gives me error.

MWE

import numpy as np
import pandas as pd
import seaborn as sns

from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector

df = sns.load_dataset('tips')
mycols = ['tip','sex']


ct = make_column_transformer(make_column_selector(pattern=mycols)
ct.fit_transform(df)

Required

I want only the select columns in the output.

NOTE
Of course, I know I can do df[mycols], I am looking for scikit learn pipeline example.

982

asked Jun 16 '20 19:06

BhishanPoudel

2 Answers

If you don't mind mlxtend, it has built-in transformer for that.

Using mlxtend

from mlxtend.feature_selection import ColumnSelector

pipe = ColumnSelector(mycols)
pipe.fit_transform(df)

For sklearn >= 0.20

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import seaborn as sns

df = sns.load_dataset('tips')
mycols = ['tip','sex']

pipeline = Pipeline([
    ("selector", ColumnTransformer([
        ("selector", "passthrough", mycols)
    ], remainder="drop"))
])

pipeline.fit_transform(df)

For sklearn < 0.20

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline

class FeatureSelector(BaseEstimator, TransformerMixin):
    def __init__(self, columns):
        self.columns = columns

    def fit(self, X, y=None):
        return self

    def transform(self, X, y=None):
        return X[self.columns]


pipeline = Pipeline([('selector', FeatureSelector(columns=mycols))
                     ])

pipeline.fit_transform(df)[:5]

166

answered Oct 14 '22 01:10

BhishanPoudel

I'm maybe a bit late, but you can also select columns using sklearn's ColumnTranformer() by setting the transformer to "passthrough" and remainder='drop':

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline


pipe = Pipeline([
    ("selector", ColumnTransformer([
        ("selector", "passthrough", mycols)
    ], remainder="drop"))
])

answered Oct 13 '22 23:10

Jens

Related questions
                            
                                Plotly Pie Chart and label order
                            
                                CORS request did not succeed in python flask-socketio
                            
                                How to find sublist which present one list not another list in python? [duplicate]
                            
                                Django 2.x drf-yasg how to create API in a custom method (like in swagger)
                            
                                Handling conditional logic + sentinel value with mypy
                            
                                How to change the value of a key in dict in Python with its position?
                            
                                Weird behaviour of `not` operator with python list
                            
                                Error installing Twisted on Windows 10, Python 3.8.0
                            
                                Does Google App Engine Flex support Pipfile?
                            
                                Time complexity for adding elements to list vs set in python
                            
                                SQLAlchemy error: "TypeError: Additional arguments should be named <dialectname>_<argument>, got 'nullable'"
                            
                                Can't import module installed with pip (anaconda python)
                            
                                Cannot install mysqlclient
                            
                                PyTorch and TensorFlow object detection - evaluate - object of type <class 'numpy.float64'> cannot be safely interpreted as an integer
                            
                                How to add variable type annotation for what goes into a Queue?
                            
                                OpenCV giving an error whenever import cv2 is used
                            
                                I installed matplotlib via pip but when I try to import matplotlib to PyCharm I get an error
                            
                                Pytest Flask, error 308 Permanent Redirect when login
                            
                                Speeding up normal distribution probability mass allocation
                            
                                ValueError when trying to use pipenv install

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to select only few columns in scikit learn column selector pipeline?

Tags:

python

pandas

scikit-learn

MWE

Required

BhishanPoudel

People also ask

2 Answers

Using mlxtend

For sklearn >= 0.20

For sklearn < 0.20

BhishanPoudel

Jens

Recent Activity

Donate For Us