Does GridSearchCV perform cross-validation?

Tags:

I'm currently working on a problem which compares three different machine learning algorithms performance on the same data-set. I divided the data-set into 70/30 training/testing sets and then performed grid search for the best parameters of each algorithm using GridSearchCV and X_train, y_train.

First question, am I suppose to perform grid search on the training set or is it suppose to be on the whole data-set?

Second question, I know that GridSearchCV uses K-fold in its' implementation, does it mean that I performed cross-validation if I used the same X_train, y_train for all three algorithms I compare in the GridSearchCV?

Any answer would be appreciated, thank you.

413

asked Mar 07 '18 19:03

kevinH

Video Answer

2 Answers

All estimators in scikit where name ends with CV perform cross-validation. But you need to keep a separate test set for measuring the performance.

So you need to split your whole data to train and test. Forget about this test data for a while.

And then pass this train data only to grid-search. GridSearch will split this train data further into train and test to tune the hyper-parameters passed to it. And finally fit the model on the whole train data with best found parameters.

Now you need to test this model on the test data you kept aside in the beginning. This will give you the near real world performance of model.

If you use the whole data into GridSearchCV, then there would be leakage of test data into parameter tuning and then the final model may not perform that well on newer unseen data.

You can look at my other answers which describe the GridSearch in more detail:

Model help using Scikit-learn when using GridSearch
scikit-learn GridSearchCV with multiple repetitions

124

answered Sep 27 '22 17:09

Vivek Kumar

Yes, GridSearchCV performs cross-validation. If I understand the concept correctly - you want to keep part of your data set unseen for the model in order to test it.

So you train your models against train data set and test them on a testing data set.

Here I was doing almost the same - you might want to check it...

answered Sep 27 '22 17:09

MaxU - stop WAR against UA

Related questions
                            
                                Swagger Codegen (with maven plugin) for OpenAPI 3.0
                            
                                data.table package in R 3.5 does not install
                            
                                Rails skip validation within model with save?
                            
                                Rebasing a local branch from develop
                            
                                Smooth scroll in Visual Studio code
                            
                                Visual Studio Code: different color themes for different projects
                            
                                How do you mock Firebase Firestore methods using Jest?
                            
                                UserWarning: Pandas doesn't allow columns to be created via a new attribute name
                            
                                How to set a text background with Flutter?
                            
                                How to make a Floating Action Button in Xamarin Forms
                            
                                NDK does not contain any platforms
                            
                                Error in Postman: Error: write EPROTO 8768:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With