I am trying to create a network in tensor flow with multiple softmax outputs, each of a different size. The network architecture is: Input -> LSTM -> Dropout. Then I have 2 softmax layers: Softmax of 10 outputs and Softmax of 20 Outputs. The reason for this is because I want to generate two sets of outputs (10 and 20), and then combine them to produce a final output. I'm not sure how to do this in Tensorflow.
Previously, to make a network like described, but with one softmax, I think I can do something like this.
inputs = tf.placeholder(tf.float32, [batch_size, maxlength, vocabsize])
lengths = tf.placeholders(tf.int32, [batch_size])
embeddings = tf.Variable(tf.random_uniform([vocabsize, 256], -1, 1))
lstm = {}
lstm[0] = tf.contrib.rnn.LSTMCell(hidden_layer_size, state_is_tuple=True, initializer=tf.contrib.layers.xavier_initializer(seed=random_seed))
lstm[0] = tf.contrib.rnn.DropoutWrapper(lstm[0], output_keep_prob=0.5)
lstm[0] = tf.contrib.rnn.MultiRNNCell(cells=[lstm[0]] * 1, state_is_tuple=True)
output_layer = {}
output_layer[0] = Layer.W(1 * hidden_layer_size, 20, 'OutputLayer')
output_bias = {}
output_bias[0] = Layer.b(20, 'OutputBias')
outputs = {}
fstate = {}
with tf.variable_scope("lstm0"):
# create the rnn graph at run time
outputs[0], fstate[0] = tf.nn.dynamic_rnn(lstm[0], tf.nn.embedding_lookup(embeddings, inputs),
sequence_length=lengths,
dtype=tf.float32)
logits = {}
logits[0] = tf.matmul(tf.concat([f.h for f in fstate[0]], 1), output_layer[0]) + output_bias[0]
loss = {}
loss[0] = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits[0], labels=labels[0]))
However, now, I want my RNN output (after the dropout) to flow into 2 softmax layers, one of size 10 and another of size 20. Does anyone have an idea of how to do this?
Thanks
Edit: Ideally I would like to use a version of softmax such as what is defined here in this Knet Julia library. Does Tensorflow have an equivalent? https://github.com/denizyuret/Knet.jl/blob/1ef934cc58f9671f2d85063f88a3d6959a49d088/deprecated/src7/op/actf.jl#L103
You aren't defining your logits for the size 10 softmax layer in your code, and you would have to do that explicitly.
Once that was done, you could use tf.nn.softmax, applying it separately to both of your logit tensors.
For example, for your 20-class softmax tensor:
softmax20 = tf.nn.softmax(logits[0])
For the other layer, you could do:
output_layer[1] = Layer.W(1 * hidden_layer_size, 10, 'OutputLayer10')
output_bias[1] = Layer.b(10, 'OutputBias10')
logits[1] = tf.matmul(tf.concat([f.h for f in fstate[0]], 1),
output_layer[1]) + output_bias[1]
softmax10 = tf.nn.softmax(logits[1])
There is also a tf.contrib.layers.softmax which allows you to apply the softmax on the final axis of a tensor with greater than 2 dimensions, but it doesn't look like you need anything like that. tf.nn.softmax should work here.
Side note: output_layer
is not the greatest name for that list - should be something involving weights. These weights and biases (output_layer
, output_bias
) also do not represent the output layer of your network (as that will come from whatever you do to your softmax outputs, right?). [Sorry, couldn't help myself.]
You can do the following on the output of dynamic_rnn
that you called output[0]
in order to compute the two softmax and the corresponding losses:
with tf.variable_scope("softmax_0"):
# Transform you RNN output to the right output size = 10
W = tf.get_variable("kernel_0", [output[0].get_shape()[1], 10])
logits_0 = tf.matmul(inputs, W)
# Apply the softmax function to the logits (of size 10)
output_0 = tf.nn.softmax(logits_0, name = "softmax_0")
# Compute the loss (as you did in your question) with softmax_cross_entropy_with_logits directly applied on logits
loss_0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_0, labels=labels[0]))
with tf.variable_scope("softmax_1"):
# Transform you RNN output to the right output size = 20
W = tf.get_variable("kernel_1", [output[0].get_shape()[1], 20])
logits_1 = tf.matmul(inputs, W)
# Apply the softmax function to the logits (of size 20)
output_1 = tf.nn.softmax(logits_1, name = "softmax_1")
# Compute the loss (as you did in your question) with softmax_cross_entropy_with_logits directly applied on logits
loss_1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_1, labels=labels[1]))
You can then combine the two losses if it is relevant to your application:
total_loss = loss_0 + loss_1
EDIT To answer your question in comment about what you specifically need to do with the two softmax outputs: you can do the following approximately:
with tf.variable_scope("second_part"):
W1 = tf.get_variable("W_1", [output_1.get_shape()[1], n])
W2 = tf.get_variable("W_2", [output_2.get_shape()[1], n])
prediction = tf.matmul(output_1, W1) + tf.matmul(output_2, W2)
with tf.variable_scope("optimization_part"):
loss = tf.reduce_mean(tf.squared_difference(prediction, label))
You just need to defined n
, the number of columns of W1 and W2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With