I need to select some features from dataset for a regression task. But the numerical values are from different ranges.
from sklearn.datasets import load_boston
from sklearn.feature_selection import SelectKBest, f_regression
X, y = load_boston(return_X_y=True)
X_new = SelectKBest(f_regression, k=2).fit_transform(X, y)
To increase the performance of regression model do I need to normalize X before SelectKBest
method?
The answer is that it depends on your data -- so you should try it to see if it helps! Here's a quick way to transform each variable so that it has a mean of 0 and variance of 1:
from sklearn.datasets import load_boston
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.preprocessing import StandardScaler
X, y = load_boston(return_X_y=True)
scaler_x = StandardScaler().fit(X)
X = scaler_x.transform(X)
X_new = SelectKBest(f_regression, k=2).fit_transform(X, y)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With