Which one among Gridsearchcv and Bayesian optimization works better for optimizing hyper parameters?
Bayesian optimization methods are efficient because they select hyperparameters in an informed manner. By prioritizing hyperparameters that appear more promising from past results, Bayesian methods can find the best hyperparameters in lesser time (in fewer iterations) than both grid search and random search.
Unlike the grid search and random search, which treat hyperparameter sets independently, the Bayesian optimization is an informed search method, meaning that it learns from previous iterations. The number of trials in this approach is determined by the user.
Bayesian optimization is a powerful strategy for finding the extrema of objective functions that are expensive to evaluate. […] It is particularly useful when these evaluations are costly, when one does not have access to derivatives, or when the problem at hand is non-convex.
The only difference between both the approaches is in grid search we define the combinations and do training of the model whereas in RandomizedSearchCV the model selects the combinations randomly. Both are very effective ways of tuning the parameters that increase the model generalizability.
There is no better here, they are different approaches.
In Grid Search
you try all the possible hyperparameters combinations within some ranges.
In Bayesian
you don't try all the combinations, you search along the space of hyperparameters learning as you try them. This enables to avoid trying ALL the combinations.
So the pro of Grid Search
is that you are exhaustive and the pro of Bayesian
is that you don't need to be, basically if you can in terms of computing power go for Grid Search
but if the space to search is too big go for Bayesian
.
Grid search is known to be worse than random search for optimizing hyperparameters [1], both in theory and in practice. Never use grid search unless you are optimizing one parameter only. On the other hand, Bayesian optimization is stated to outperform random search on various problems, also for optimizing hyperparameters [2]. However, this does not take into account several things: the generalization capabilities of models that use those hyperparameters, the effort to use Bayesian optimization compared to the much simpler random search, and the possibility to use random search in parallel.
So in conclusion, my recommendation is: never use grid search, use random search if you just want to try a few hyperparameters and can try them in parallel (or if you want the hyperparameters to generalize to different problems), and use Bayesian optimization if you want the best results and are willing to use a more advanced method.
[1] Random Search for Hyper-Parameter Optimization, Bergstra & Bengio 2012.
[2] Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020, Turner et al. 2021.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With