Tensorflow RNN weight matrices initialization

Tags:

recurrent-neural-network

I'm using bidirectional_rnn with GRUCell but this is a general question regarding the RNN in Tensorflow.

I couldn't find how to initialize the weight matrices (input to hidden, hidden to hidden). Are they initialized randomly? to zeros? are they initialized differently for each LSTM I create?

EDIT: Another motivation for this question is in pre-training some LSTMs and using their weights in a subsequent model. I don't currently know how to do that currently without saving all the states and restoring the entire model.

Thanks.

799

asked Oct 29 '16 11:10

yoki

2 Answers

How to initialize weight matrices for RNN?

I believe people are using random normal initialization for weight matrices for RNN. Check out the example in TensorFlow GitHub Repo. As the notebook is a bit long, they have a simple LSTM model where they use tf.truncated_normal to initialize weights and tf.zeros to initialize biases (although I have tried using tf.ones to initialize biases before, seem to also work). I believe that the standard deviation is a hyperparameter you could tune yourself. Sometimes weights initialization is important to the gradient flow. Although as far as I know, LSTM itself is designed to handle gradient vanishing problem (and gradient clipping is for helping gradient exploding problem), so perhaps you don't need to be super careful with the setup of std_dev in LSTM? I've read papers recommending Xavier initialization (TF API doc for Xavier initializer) in Convolution Neural Network context. I don't know if people use that in RNN, but I imagine you can even try those in RNN if you want to see if it helps.

Now to follow up with @Allen's answer and your follow up question left in the comments.

How to control initialization with variable scope?

Using the simple LSTM model in the TensorFlow GitHub python notebook that I linked to as an example. enter image description here Specifically, if I want to re-factorize the LSTM part of the code in above picture using variable scope control, I may code something as following...

import tensorflow as tf
def initialize_LSTMcell(vocabulary_size, num_nodes, initializer):
    '''initialize LSTMcell weights and biases, set variables to reuse mode'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    with tf.variable_scope('LSTMcell') as scope:
        for gate in gates:
            with tf.variable_scope(gate) as gate_scope:
                wx = tf.get_variable("wx", [vocabulary_size, num_nodes], initializer)
                wt = tf.get_variable("wt", [num_nodes, num_nodes], initializer)
                bi = tf.get_variable("bi", [1, num_nodes, tf.constant_initializer(0.0)])
                gate_scope.reuse_variables() #this line can probably be omitted, b.z. by setting 'LSTMcell' scope variables to 'reuse' as the next line, it'll turn on the reuse mode for all its child scope variables
        scope.reuse_variables()

def get_scope_variables(scope_name, variable_names):
    '''a helper function to fetch variable based on scope_name and variable_name'''
    vars = {}
    with tf.variable_scope(scope_name, reuse=True):
        for var_name in variable_names
            var = tf.get_variable(var_name)
            vars[var_name] = var
    return vars

def LSTMcell(i, o, state):
    '''a function for performing LSTMcell computation'''
    gates = ['input_gate', 'forget_gate', 'memory_cell', 'output_gate']
    var_names = ['wx', 'wt', 'bi']
    gate_comp = {}
    with tf.variable_scope('LSTMcell', reuse=True):
        for gate in gates:
            vars = get_scope_variables(gate, var_names)
            gate_comp[gate] = tf.matmul(i, vars['wx']) + tf.matmul(o, vars['wt']) + vars['bi']
    state = tf.sigmoid(gate_comp['forget_gate']) * state + tf.sigmoid(gate_comp['input_gate']) * tf.tanh(gate_comp['memory_cell'])
    output = tf.sigmoid(gate_comp['output_gate']) * tf.tanh(state)
    return output, state

The usage of the re-factorized code would be something like following...

initialize_LSTMcell(volcabulary_size, num_nodes, tf.truncated_normal_initializer(mean=-0.1, stddev=.01))
#...Doing some computation...
LSTMcell(input_tensor, output_tensor, state)

Even though the refactorized code may look less straightforward, but using scope variable control ensures scope encapsulation and allows flexible variable controls (in my opinion at least).

In pre-training some LSTMs and using their weights in a subsequent model. How to do that without saving all the states and restoring the entire model.

Assuming you have a pre-trained model froze and loaded in, if you wanna use their frozen 'wx', 'wt' and 'bi', you can simply find their parent scope names and variable names, then fetch the variables using similar structure in get_scope_variables func.

with tf.variable_scope(scope_name, reuse=True):
    var = tf.get_variable(var_name)

Here is a link to understanding variable scope and sharing variables. I hope this is helpful.

120

answered Jan 04 '23 06:01

Zhongyu Kuang

The RNN models will create their variables with get_variable, and you can control the initialization by wrapping the code which creates those variables with a variable_scope and passing a default initializer to it. Unless the RNN specifies one explicitly (looking at the code, it doesn't), uniform_unit_scaling_initializer is used.

You should also be able to share model weights by declaring the second model and passing reuse=True to its variable_scope. As long as the namespaces match up, the new model will get the same variables as the first model.

answered Jan 04 '23 06:01

Allen Lavoie

Related questions
                            
                                Convert image from float64 to uint8 makes the image look darker
                            
                                What is the difference between concatenate and add in keras?
                            
                                Could not load library cudnn_ops_infer64_8.dll. Error code 126 Please make sure cudnn_ops_infer64_8.dll is in your library path
                            
                                How to directly write to summary which mimics scalar_summary?
                            
                                Using Keras, how can I input an X_train of images (more than a thousand images)?
                            
                                Tensorflow indicator matrix for top n values
                            
                                TensorBoard can not read summaries on Google Cloud Storage
                            
                                Keras Image data generator throwing no files found error?
                            
                                In Neural Networks: accuracy improvement after each epoch is GREATER than accuracy improvement after each batch. Why?
                            
                                TypeError: List of Tensors when single Tensor expected - when using const with tf.random_normal
                            
                                How to check if dlib is using GPU or not?
                            
                                Why does sigmoid & crossentropy of Keras/tensorflow have low precision?
                            
                                AttributeError: module 'tensorflow' has no attribute 'get_default_graph'
                            
                                How to plot confusion matrix for prefetched dataset in Tensorflow
                            
                                TensorFlow - object detection module, error appear when trying to use protoc
                            
                                Can not save model using model.save following multi_gpu_model in Keras
                            
                                Unable to transform string column to categorical matrix using Keras and Sklearn
                            
                                AttributeError: 'Model' object has no attribute 'name'
                            
                                Estimator's model_fn includes params argument, but params are not passed to Estimator
                            
                                AssertionError: Some objects had attributes which were not restored

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With