Why we are using the MinMaxScaler() and what does it do?
scaler = MinMaxScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Core of the method
A way to normalize the input features/variables is the Min-Max scaler. By doing so, all features will be transformed into the range [0,1]
meaning that the minimum and maximum value of a feature/variable is going to be 0 and 1, respectively.
Why to normalize prior to model fitting?
The main idea behind normalization/standardization is always the same. Variables that are measured at different scales do not contribute equally to the model fitting & model learned function and might end up creating a bias. Thus, to deal with this potential problem feature-wise normalization such as MinMax Scaling is usually used prior to model fitting.
More here: https://towardsdatascience.com/everything-you-need-to-know-about-min-max-normalization-in-python-b79592732b79
Essentially, the code is scaling the independent variables so that they lie in the range of 0 and 1. This is important because few variable values might be in thousands and few might be in small ranges. Hence to handle such cases scaling is important. Logistic regression is sensitive to such high values. More about min max is here: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With