Some articles says that in case of having only train and test sets, first, we need to use fit_transform() to scale training set and then only transform() for test set, in order to prevent data leakage.
In my case, I have also validation set.
I think one of these codes below would be okay to use but I cannot rely on them completely. Any kind of help will be appreciated, thanks!
1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 2/7)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)
2)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 2/7)
X_test = scaler.transform(X_test)
We have to train multiple models by trying different combinations of hyperparameters. Then, we evaluate the performance of each model on the validation set. Therefore, the validation test is useful for hyperparameter tuning or selecting the best model out of different models.
In a scenario where both validation and test datasets are used, the test data set is typically used to assess the final model that is selected during the validation process.
Training datasets comprise samples used to fit models under construction, i.e., carry out the actual AI development. Constructing these robust pillars of AI involves following best practices. In contrast, validation datasets contain different samples to evaluate trained ML models.
Generally you would want to use Option 1 code. The reason for using fit and then transform with train data is a) Fit would calculate mean,var etc of train set and then try to fit the model to data b) post which transform is going to convert data as per the fitted model.
If you use fit again with test set this is going to add bias to your model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With