I was reading the scikitlearn tutorial about column transformer. The given example (https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html#sklearn.compose.make_column_selector) works, but when I tried to select only few columns, It gives me error.
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.compose import make_column_transformer
from sklearn.compose import make_column_selector
df = sns.load_dataset('tips')
mycols = ['tip','sex']
ct = make_column_transformer(make_column_selector(pattern=mycols)
ct.fit_transform(df)
I want only the select columns in the output.
NOTE
Of course, I know I can do df[mycols]
, I am looking for scikit learn pipeline example.
In this section, we will learn how scikit learn pipeline feature selection works in python. Feature selection is defined as a method to select the features or repeatedly select the features of the pipeline. In the following code, we will import some libraries from which we can select the feature of the pipeline.
Feature Selection with SelectKBest in Scikit Learn. In this post, you will learn how to do feature selection with SelectKBest in scikit Learn. Why we do Feature Selection ? 1 . Getting more interpretable model 2 . Faster prediction and training 3 . Less storage for model and data How to do Feature Selection with SelectKBest?
Create a callable to select columns to be used with ColumnTransformer. make_column_selector can select columns based on datatype or the columns name with a regex. When using multiple selection criteria, all criteria must match for a column to be selected. Name of columns containing this regex pattern will be included.
Feature selection is defined as a method to select the features or repeatedly select the features of the pipeline. In the following code, we will import some libraries from which we can select the feature of the pipeline. x, y = make_classification () is used to make classification.
If you don't mind mlxtend
, it has built-in transformer for that.
from mlxtend.feature_selection import ColumnSelector
pipe = ColumnSelector(mycols)
pipe.fit_transform(df)
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import seaborn as sns
df = sns.load_dataset('tips')
mycols = ['tip','sex']
pipeline = Pipeline([
("selector", ColumnTransformer([
("selector", "passthrough", mycols)
], remainder="drop"))
])
pipeline.fit_transform(df)
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline
class FeatureSelector(BaseEstimator, TransformerMixin):
def __init__(self, columns):
self.columns = columns
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
return X[self.columns]
pipeline = Pipeline([('selector', FeatureSelector(columns=mycols))
])
pipeline.fit_transform(df)[:5]
I'm maybe a bit late, but you can also select columns using sklearn's ColumnTranformer()
by setting the transformer to "passthrough" and remainder='drop'
:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
pipe = Pipeline([
("selector", ColumnTransformer([
("selector", "passthrough", mycols)
], remainder="drop"))
])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With