Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between cross-validation and grid search?

In simple words, what is the difference between cross-validation and grid search? How does grid search work? Should I do first a cross-validation and then a grid search?

like image 964
Linda Avatar asked Oct 12 '13 14:10

Linda


People also ask

Is cross-validation necessary for grid search?

Conclusion: By using cross validation and grid search we were able to have a more meaningful result when compared to our original train/test split with minimal tuning. Cross validation is a very important method used to create better fitting models by training and testing on all parts of the training dataset.

Is GridSearchCV cross-validation?

Yes, GridSearchCV performs cross-validation. If I understand the concept correctly - you want to keep part of your data set unseen for the model in order to test it. So you train your models against train data set and test them on a testing data set.

What is the difference between grid search and randomized search?

The key difference from grid search is in random search, not all the values are tested and values tested are selected at random. For example, if there are 500 values in the distribution and if we input n_iter=50 then random search will randomly sample 50 values to test.

What is the difference between cross-validation and K fold?

When people refer to cross validation they generally mean k-fold cross validation. In k-fold cross validation what you do is just that you have multiple(k) train-test sets instead of 1. This basically means that in a k-fold CV you will be training your model k-times and also testing it k-times.


2 Answers

Cross-validation is when you reserve part of your data to use in evaluating your model. There are different cross-validation methods. The simplest conceptually is to just take 70% (just making up a number here, it doesn't have to be 70%) of your data and use that for training, and then use the remaining 30% of the data to evaluate the model's performance. The reason you need different data for training and evaluating the model is to protect against overfitting. There are other (slightly more involved) cross-validation techniques, of course, like k-fold cross-validation, which often used in practice.

Grid search is a method to perform hyper-parameter optimisation, that is, it is a method to find the best combination of hyper-parameters (an example of an hyper-parameter is the learning rate of the optimiser), for a given model (e.g. a CNN) and test dataset. In this scenario, you have several models, each with a different combination of hyper-parameters. Each of these combinations of parameters, which correspond to a single model, can be said to lie on a point of a "grid". The goal is then to train each of these models and evaluate them e.g. using cross-validation. You then select the one that performed best.

To give a concrete example, if you're using a support vector machine, you could use different values for gamma and C. So, for example, you could have a grid with the following values for (gamma, C): (1, 1), (0.1, 1), (1, 10), (0.1, 10). It's a grid because it's like a product of [1, 0.1] for gamma and [1, 10] for C. Grid-search would basically train a SVM for each of these four pair of (gamma, C) values, then evaluate it using cross-validation, and select the one that did best.

like image 55
Or Neeman Avatar answered Oct 15 '22 02:10

Or Neeman


Cross-validation is a method for robustly estimating test-set performance (generalization) of a model. Grid-search is a way to select the best of a family of models, parametrized by a grid of parameters.

Here, by "model", I don't mean a trained instance, more the algorithms together with the parameters, such as SVC(C=1, kernel='poly').

like image 45
Andreas Mueller Avatar answered Oct 15 '22 00:10

Andreas Mueller