I am trying to use Bayesian optimization (Hyperopt) for obtaining optimal parameters for SVM algorithm. However, I find that the optimal parameters are changing with every run.
Provided below is a simple reproducible case. Can you please throw some light into this?
import numpy as np
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.svm import SVC
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
def hyperopt_train_test(params):
clf = svm.SVC(**params)
return cross_val_score(clf, X, y).mean()
space4svm = {
'C': hp.loguniform('C', -3, 3),
'gamma': hp.loguniform('gamma', -3, 3),
}
def f(params):
acc = hyperopt_train_test(params)
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f, space4svm, algo=tpe.suggest, max_evals=1000, trials=trials)
print ('best:')
print (best)
Following are some of the optimal values.
best: {'C': 0.08776548401545513, 'gamma': 1.447360198193232}
best: {'C': 0.23621788050791617, 'gamma': 1.2467882092108042}
best: {'C': 0.3134163250819116, 'gamma': 1.0984778155489887}
Thats because the during the execution of fmin
, hyperopt
is drawing out different values of 'C'
and 'gamma'
from the defined search space space4cvm
randomly during each run of the program.
To fix this and produce deterministic results, you need to use the 'rstate'
param of fmin
:
rstate :
numpy.RandomState, default numpy.random or `$HYPEROPT_FMIN_SEED` Each call to `algo` requires a seed value, which should be different on each call. This object is used to draw these seeds via `randint`. The default rstate is numpy.random.RandomState(int(env['HYPEROPT_FMIN_SEED'])) if the 'HYPEROPT_FMIN_SEED' environment variable is set to a non-empty string, otherwise np.random is used in whatever state it is in.
So if not set explicitly, by default it will check if the environment variable 'HYPEROPT_FMIN_SEED'
is set or not. If not, then it will use a random number each time.
You can use this by :
rstate = np.random.RandomState(42) #<== Use any number here but fixed
best = fmin(f, space4svm, algo=tpe.suggest, max_evals=100, trials=trials, rstate=rstate)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With