Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

need normalization before SelectKBest in python

I need to select some features from dataset for a regression task. But the numerical values are from different ranges.

from sklearn.datasets import load_boston
from sklearn.feature_selection import SelectKBest, f_regression

X, y = load_boston(return_X_y=True)
X_new = SelectKBest(f_regression, k=2).fit_transform(X, y)

To increase the performance of regression model do I need to normalize X before SelectKBest method?

like image 350
user3104352 Avatar asked Nov 07 '22 01:11

user3104352


1 Answers

The answer is that it depends on your data -- so you should try it to see if it helps! Here's a quick way to transform each variable so that it has a mean of 0 and variance of 1:

from sklearn.datasets import load_boston
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.preprocessing import StandardScaler

X, y = load_boston(return_X_y=True)

scaler_x = StandardScaler().fit(X)
X = scaler_x.transform(X)

X_new = SelectKBest(f_regression, k=2).fit_transform(X, y)
like image 56
killian95 Avatar answered Nov 15 '22 06:11

killian95