Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hyperopt: Optimal parameter changing with rerun

I am trying to use Bayesian optimization (Hyperopt) for obtaining optimal parameters for SVM algorithm. However, I find that the optimal parameters are changing with every run.

Provided below is a simple reproducible case. Can you please throw some light into this?

import numpy as np 
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials

from sklearn.svm import SVC
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, cross_val_score
from sklearn.model_selection import StratifiedShuffleSplit

iris = datasets.load_iris()
X = iris.data[:, :2] 
y = iris.target

def hyperopt_train_test(params):
    clf = svm.SVC(**params)
    return cross_val_score(clf, X, y).mean()

space4svm = {
    'C': hp.loguniform('C', -3, 3),
    'gamma': hp.loguniform('gamma', -3, 3),
}

def f(params):
    acc = hyperopt_train_test(params)
    return {'loss': -acc, 'status': STATUS_OK}

trials = Trials()

best = fmin(f, space4svm, algo=tpe.suggest, max_evals=1000, trials=trials)

print ('best:')
print (best)

Following are some of the optimal values.

best: {'C': 0.08776548401545513, 'gamma': 1.447360198193232}

best: {'C': 0.23621788050791617, 'gamma': 1.2467882092108042}

best: {'C': 0.3134163250819116, 'gamma': 1.0984778155489887}

like image 614
Regi Mathew Avatar asked Dec 15 '18 14:12

Regi Mathew


1 Answers

Thats because the during the execution of fmin, hyperopt is drawing out different values of 'C' and 'gamma' from the defined search space space4cvm randomly during each run of the program.

To fix this and produce deterministic results, you need to use the 'rstate' param of fmin:

rstate :

    numpy.RandomState, default numpy.random or `$HYPEROPT_FMIN_SEED`

    Each call to `algo` requires a seed value, which should be different
    on each call. This object is used to draw these seeds via `randint`.
    The default rstate is numpy.random.RandomState(int(env['HYPEROPT_FMIN_SEED']))
    if the 'HYPEROPT_FMIN_SEED' environment variable is set to a non-empty
    string, otherwise np.random is used in whatever state it is in.

So if not set explicitly, by default it will check if the environment variable 'HYPEROPT_FMIN_SEED' is set or not. If not, then it will use a random number each time.

You can use this by :

rstate = np.random.RandomState(42)   #<== Use any number here but fixed

best = fmin(f, space4svm, algo=tpe.suggest, max_evals=100, trials=trials, rstate=rstate)
like image 94
Vivek Kumar Avatar answered Nov 15 '22 10:11

Vivek Kumar