Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow - How to implement hyper parameters random search?

Tags:

tensorflow

Consider this simple graph + session definition. Suppose I want to tune hyper params (learning rate and drop out keep probability) with a random search? What is the recommended way to implement it?

graph = tf.Graph()
with graph.as_default():

    # Placeholders
    data = tf.placeholder(tf.float32,shape=(None,  img_h, img_w, num_channels),name='data')
    labels = ...
    dropout_keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    learning_rate = tf.placeholder(tf.float32, name='learning_rate')

    # model architecture...

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    for step in range(num_steps):
        offset = (step * batch_size) % (train_length.shape[0] - batch_size)
        # Generate a minibatch.
        batch_data = train_images[offset:(offset + batch_size), :]
        #...
        feed_train = {data: batch_data, 
                      #...
                      learning_rate: 0.001,
                      keep_prob : 0.7
                     }

I tried putting everything inside a function

def run_model(learning_rate,keep_prob):
    graph = tf.Graph()
    with graph.as_default():
    # graph here...

    with tf.Session(graph=graph) as session:
        tf.initialize_all_variables().run()
        # session here...

But I ran into scope issues (I am not very familiar with scopes in Python/Tensoflow). Is there a best practice to achieve this?

like image 327
znat Avatar asked Nov 07 '16 14:11

znat


People also ask

Is random search used for hyperparameter tuning?

The scikit-learn Python open-source machine learning library provides techniques to tune model hyperparameters. Specifically, it provides the RandomizedSearchCV for random search and GridSearchCV for grid search.

Is Hyperopt better than random search?

Hyperopt ends up a bit slower than Random Search, but note the significantly lower number of iterations it took to get to the optimum. Also, it manages to get to a relatively better score on the test set. This is why you would want to use Hyperopt. However, do keep in mind that Hyperopt not always ends up on top.

How do I tune CNN hyperparameters?

The hyperparameters to tune are the number of neurons, activation function, optimizer, learning rate, batch size, and epochs. The second step is to tune the number of layers. This is what other conventional algorithms do not have.


1 Answers

I implemented random search of hyper-parameter in a similar way, and things worked out fine. Basically what I did was I have a function general random hyper-parameters outside of graph and session. I wrapped the graph and session into a function as you did, and I passed on the generated hyper-parameters. See the code for better illustration.

def generate_random_hyperparams(lr_min, lr_max, kp_min, kp_max):
    '''generate random learning rate and keep probability'''
    # random search through log space for learning rate
    random_learng_rate = 10**np.random.uniform(lr_min, lr_max)
    random_keep_prob = np.random.uniform(kp_min, kp_max)
    return random_learning_rate, random_keep_prob

I suspect the scope issue you are running into (since you didn't provide the exact error message I can only speculate) is caused by some careless naming... I would modify how you are naming variables in your run_model function.

def run_model(random_learning_rate,random_keep_prob):
    # Note that the arguments is named differently from the placeholders in the graph
    graph = tf.Graph()
    with graph.as_default():
        # graph here...
        learning_rate = tf.placeholder(tf.float32, name='learning_rate')
        keep_prob = tf.placeholder(tf.float32, name='keep_prob')
        # other operation ...

    with tf.Session(graph=graph) as session:
        tf.initialize_all_variables().run()
        # session here...
        feed_train = {data: batch_data, 
                  #placeholder variable names as dict key, python value variables as dict value
                  learning_rate: random_learning_rate,
                  keep_prob : random_keep_prob
                 }
        # evaluate performance with random_learning_rate and random_keep_prob
        performance = session.run([...], feed_dict = feed_train)
    return performance

Remember to use different variable names to name the tf.placeholders and the ones carrying the real python values.

The usage of above snippets would be something like:

performance_records = {}
for i in range(10): # random search hyper-parameter space 10 times
    random_learning_rate, random_keep_prob = generate_random_hyperparams(-5, -1, 0.2, 0.8)
    performance = run_model(random_learning_rate, random_keep_prob)
    performance_records[(random_learning_rate, random_keep_prob)] = performance
like image 186
Zhongyu Kuang Avatar answered Dec 11 '22 16:12

Zhongyu Kuang