I'm trying to train an LSTM in Tensorflow using minibatches, but after training is complete I would like to use the model by submitting one example at a time to it. I can set up the graph within Tensorflow to train my LSTM network, but I can't use the trained result afterward in the way I want.
The setup code looks something like this:
#Build the LSTM model.
cellRaw = rnn_cell.BasicLSTMCell(LAYER_SIZE)
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS)
cell = rnn_cell.DropoutWrapper(cellRaw, output_keep_prob = 0.25)
input_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None, 3])
target_data = tf.placeholder(dtype=tf.float32, shape=[SEQ_LENGTH, None])
initial_state = cell.zero_state(batch_size=BATCH_SIZE, dtype=tf.float32)
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, 6])
output_b = tf.get_variable("output_b", [6])
outputs, final_state = seq2seq.rnn_decoder(input_list, initial_state, cell, loop_function=None, scope='rnnlm')
output = tf.reshape(tf.concat(1, outputs), [-1, LAYER_SIZE])
output = tf.nn.xw_plus_b(output, output_w, output_b)
...Note the two placeholders, input_data and target_data. I haven't bothered including the optimizer setup. After training is complete and the training session closed, I would like to set up a new session that uses the trained LSTM network whose input is provided by a completely different placeholder, something like:
with tf.Session() as sess:
with tf.variable_scope("simulation", reuse=None):
cellSim = cellRaw
input_data_sim = tf.placeholder(dtype=tf.float32, shape=[1, 1, 3])
initial_state_sim = cell.zero_state(batch_size=1, dtype=tf.float32)
input_list_sim = tf.unpack(input_data_sim)
outputsSim, final_state_sim = seq2seq.rnn_decoder(input_list_sim, initial_state_sim, cellSim, loop_function=None, scope='rnnlm')
outputSim = tf.reshape(tf.concat(1, outputsSim), [-1, LAYER_SIZE])
with tf.variable_scope('rnnlm'):
output_w = tf.get_variable("output_w", [LAYER_SIZE, nOut])
output_b = tf.get_variable("output_b", [nOut])
outputSim = tf.nn.xw_plus_b(outputSim, output_w, output_b)
This second part returns the following error:
tensorflow.python.framework.errors.InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder' with dtype float
[[Node: Placeholder = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
...Presumably because the graph I'm using still has the old training placeholders attached to the trained LSTM nodes. What's the right way to 'extract' the trained LSTM and put it into a new, different graph that has a different style of inputs? The Varible scoping features that Tensorflow has seem to address something like this, but the examples in the documentation all talk about using variable scope as a way of managing variable names so that the same piece of code will generate similar subgraphs within the same graph. The 'reuse' feature seems to be close to what I want, but I don't find the Tensorflow documentation linked above to be clear at all on what it does. The cells themselves cannot be given a name (in other words,
cellRaw = rnn_cell.MultiRNNCell([cellRaw] * NUM_LAYERS, name="multicell")
is not valid), and while I can give a name to a seq2seq.rnn_decoder(), I presumably wouldn't be able to remove the rnn_cell.DropoutWrapper() if I used that node unchanged.
Questions:
What is the proper way to move trained LSTM weights from one graph to another?
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
Thanks!
What is the proper way to move trained LSTM weights from one graph to another?
You can create your decoding graph first (with a saver object to save the parameters) and create a GraphDef object that you can import in your bigger training graph:
basegraph = tf.Graph()
with basegraph.as_default():
***your graph***
traingraph = tf.Graph()
with traingraph.as_default():
tf.import_graph_def(basegraph.as_graph_def())
***your training graph***
make sure you load your variables when you start a session for a new graph.
I don't have experience with this functionality so you may have to look into it a bit more
Is it correct to say that starting a new session "releases resources", but doesn't erase the graph built in memory?
yep, the graph object still hold it
It seems to me like the 'reuse' feature allows Tensorflow to search outside of the current variable scope for variables with the same name (existing in a different scope), and use them in the current scope. Is this correct? If it is, what happens to all of the graph edges from the non-current scope that link to that variable? If it isn't, why does Tensorflow throw an error if you try to have the same variable name within two different scopes? It seems perfectly reasonable to define two variables with identical names in two different scopes, e.g. conv1/sum1 and conv2/sum1.
No, reuse is to determine the behaviour when you use get_variable on an existing name, when it is true it will return the existing variable, otherwise it will return a new one. Normally tensorflow should not throw an error. Are you sure your using tf.get_variable and not just tf.Variable?
In my code I'm working within a new scope but the graph won't run without data to be fed into a placeholder from the initial, default scope. Is the default scope always 'in-scope' for some reason?
I don't really see what you mean. The do not always have to be used. If a placeholder is not required for running an operation you don't have to define it.
If graph edges can span different scopes, and names in different scopes can't be shared unless they refer to the exact same node, then that would seem to defeat the purpose of having different scopes in the first place. What am I misunderstanding here?
I think your understanding or usage of scopes is flawed, see above
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With