Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Have Multiple Softmax Outputs in Tensorflow?

I am trying to create a network in tensor flow with multiple softmax outputs, each of a different size. The network architecture is: Input -> LSTM -> Dropout. Then I have 2 softmax layers: Softmax of 10 outputs and Softmax of 20 Outputs. The reason for this is because I want to generate two sets of outputs (10 and 20), and then combine them to produce a final output. I'm not sure how to do this in Tensorflow.

Previously, to make a network like described, but with one softmax, I think I can do something like this.

inputs = tf.placeholder(tf.float32, [batch_size, maxlength, vocabsize])
lengths = tf.placeholders(tf.int32, [batch_size])
embeddings = tf.Variable(tf.random_uniform([vocabsize, 256], -1, 1))
lstm = {}
lstm[0] = tf.contrib.rnn.LSTMCell(hidden_layer_size, state_is_tuple=True, initializer=tf.contrib.layers.xavier_initializer(seed=random_seed))
lstm[0] = tf.contrib.rnn.DropoutWrapper(lstm[0], output_keep_prob=0.5)
lstm[0] = tf.contrib.rnn.MultiRNNCell(cells=[lstm[0]] * 1, state_is_tuple=True)
output_layer = {}
output_layer[0] = Layer.W(1 * hidden_layer_size, 20, 'OutputLayer')
output_bias = {}
output_bias[0] = Layer.b(20, 'OutputBias')
outputs = {}
fstate = {}
with tf.variable_scope("lstm0"):
    # create the rnn graph at run time
  outputs[0], fstate[0] = tf.nn.dynamic_rnn(lstm[0], tf.nn.embedding_lookup(embeddings, inputs),
                                      sequence_length=lengths, 
                                      dtype=tf.float32)
logits = {}
logits[0] = tf.matmul(tf.concat([f.h for f in fstate[0]], 1), output_layer[0]) + output_bias[0]
loss = {}
loss[0] = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits[0], labels=labels[0]))

However, now, I want my RNN output (after the dropout) to flow into 2 softmax layers, one of size 10 and another of size 20. Does anyone have an idea of how to do this?

Thanks

Edit: Ideally I would like to use a version of softmax such as what is defined here in this Knet Julia library. Does Tensorflow have an equivalent? https://github.com/denizyuret/Knet.jl/blob/1ef934cc58f9671f2d85063f88a3d6959a49d088/deprecated/src7/op/actf.jl#L103

like image 245
hockeybro Avatar asked Oct 09 '17 04:10

hockeybro


2 Answers

You aren't defining your logits for the size 10 softmax layer in your code, and you would have to do that explicitly.

Once that was done, you could use tf.nn.softmax, applying it separately to both of your logit tensors.

For example, for your 20-class softmax tensor:

softmax20 = tf.nn.softmax(logits[0])

For the other layer, you could do:

output_layer[1] = Layer.W(1 * hidden_layer_size, 10, 'OutputLayer10')
output_bias[1] = Layer.b(10, 'OutputBias10')

logits[1] = tf.matmul(tf.concat([f.h for f in fstate[0]], 1), 
output_layer[1]) + output_bias[1]

softmax10 = tf.nn.softmax(logits[1])

There is also a tf.contrib.layers.softmax which allows you to apply the softmax on the final axis of a tensor with greater than 2 dimensions, but it doesn't look like you need anything like that. tf.nn.softmax should work here.

Side note: output_layer is not the greatest name for that list - should be something involving weights. These weights and biases (output_layer, output_bias) also do not represent the output layer of your network (as that will come from whatever you do to your softmax outputs, right?). [Sorry, couldn't help myself.]

like image 163
Neeraj Kashyap Avatar answered Sep 30 '22 07:09

Neeraj Kashyap


You can do the following on the output of dynamic_rnn that you called output[0] in order to compute the two softmax and the corresponding losses:

with tf.variable_scope("softmax_0"):
    # Transform you RNN output to the right output size = 10
    W = tf.get_variable("kernel_0", [output[0].get_shape()[1], 10])
    logits_0 = tf.matmul(inputs, W)
    # Apply the softmax function to the logits (of size 10)
    output_0 = tf.nn.softmax(logits_0, name = "softmax_0")
    # Compute the loss (as you did in your question) with softmax_cross_entropy_with_logits directly applied on logits
    loss_0 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_0, labels=labels[0]))

with tf.variable_scope("softmax_1"):  
    # Transform you RNN output to the right output size = 20
    W = tf.get_variable("kernel_1", [output[0].get_shape()[1], 20])
    logits_1 = tf.matmul(inputs, W)
    # Apply the softmax function to the logits (of size 20)
    output_1 = tf.nn.softmax(logits_1, name = "softmax_1")
    # Compute the loss (as you did in your question) with softmax_cross_entropy_with_logits directly applied on logits
    loss_1 = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits_1, labels=labels[1]))

You can then combine the two losses if it is relevant to your application:

total_loss = loss_0 + loss_1

EDIT To answer your question in comment about what you specifically need to do with the two softmax outputs: you can do the following approximately:

with tf.variable_scope("second_part"):
    W1 = tf.get_variable("W_1", [output_1.get_shape()[1], n])
    W2 = tf.get_variable("W_2", [output_2.get_shape()[1], n])
    prediction = tf.matmul(output_1, W1) + tf.matmul(output_2, W2)
with tf.variable_scope("optimization_part"):
    loss = tf.reduce_mean(tf.squared_difference(prediction, label))

You just need to defined n, the number of columns of W1 and W2.

like image 35
Pop Avatar answered Sep 30 '22 07:09

Pop