I am trying to understand how exactly the GridSearchCV in scikit-learn implements the train-validation-test principle in machine learning. As you see in the following code, I understand what it does is as follows:
Question 1: what is exactly going on in step 3 with respect to the parameter space? Is GridSearchCV trying every parameter combination on every one of the five runs (5-fold) so giving a total of 10 runs? (i.e., the single param from 'optmizers', 'init', and 'batches' is paired with the 2 from 'epoches']
Question 2: what scores does line 'cross_val_score' print? Is this the average of the 10 above runs on the single fold of the data in each of the 5 runs? (i.e., the average of five 15% of the entire dataset)?
Question 3: suppose line 5 now has only 1 parameter value, this time GridSearchCV is really not searching any parameters because each parameter has only 1 value, is this correcct?
Question 4: in case explained in question 3, if we take a weighted average of the scores computed on the 5-folds of GridSearchCV runs and the heldout run, that gives us an average peformance score on the entire dataset - this is very similar to a 6-fold cross-validation experiment (i.e., without gridsearch), except the 6 fold are not entirely equal size. Or is this not?
Many thanks in advance for any replies!
X_train_data, X_test_data, y_train, y_test = \
train_test_split(dataset[:,0:8], dataset[:,8],
test_size=0.25,
random_state=42) #line 1
model = KerasClassifier(build_fn=create_model, verbose=0)
optimizers = ['adam'] #line 3
init = ['uniform']
epochs = [10,20] #line 5
batches = [5] # line 6
param_grid = dict(optimizer=optimizers, epochs=epochs, batch_size=batches, init=init)
grid = GridSearchCV(estimator=model, param_grid=param_grid, cv=5) # line 8
grid_result = grid.fit(X_train_data, y_train)
cross_val_score(grid.best_estimator_, X_train_data, y_train, cv=5).mean() #line 10
best_param_ann = grid.best_params_ #line 11
best_estimator = grid.best_estimator_
heldout_predictions = best_estimator.predict(X_test_data) #line 13
Question 1: As you said, you dataset will be split in 5 pieces. Every parameters will be tried (in your case 2). For each parameters, model will be trained on 4 of the 5 folds. The remaining one will be used as test. So you are right, in your example, you are going to train 10 times a model.
Question 2: 'cross_val_score' is the average (accuracy, loss or something) on the 5 test folds. This is done to avoid having for example a good result just because the test set was really easy.
Question 3: Yes. It makes no sense if you have only one set of parameter to try to do a grid search
Question 4: I didn't exactly understand your question. Usually, you use a grid search on your train set. This allows you to keep your test set as a validation set. Without cross validation, you could find a perfect setting to maximise results on your test set and you would be overfitting your test set. With a cross validation, you can play as much as you want with fine-tuning parameter as you don't use your validation set to set it up.
In your code, there is no big need of CV as you don't have a lot of parameters to play with, but if you start adding regularization, you may try 10+ and in such case, CV is required.
I hope it helps,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With