Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor tensor values in Theano/Keras?

I know this question has been asked in various forms, but I can't really find any answer I can understand and use. So forgive me if this is a basic question, 'cause I'm a newbie to these tools(theano/keras)

Problem to Solve

Monitor variables in Neural Networks (e.g. input/forget/output gate values in LSTM)

What I'm currently getting

no matter in which stage I'm getting those values, I'm getting something like :

Elemwise{mul,no_inplace}.0
Elemwise{mul,no_inplace}.0
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
Subtensor{int64}.0
Subtensor{int64}.0

Is there any way I can't monitor(e.g. print to stdout, write to a file, etc) them?

Possible Solution

Seems like callbacks in Keras can do the job, but it doesn't work either for me. I'm getting same thing as above

My Guess

Seems like I'm making very simple mistakes.

Thank you very much in advance, everyone.


ADDED

Specifically, I'm trying to monitor input/forget/output gating values in LSTM. I found that LSTM.step() is for computing those values:

def step(self, x, states):
    h_tm1 = states[0]   # hidden state of the previous time step
    c_tm1 = states[1]   # cell state from the previous time step
    B_U = states[2]     # dropout matrices for recurrent units?
    B_W = states[3]     # dropout matrices for input units?

    if self.consume_less == 'cpu':                              # just cut x into 4 pieces in columns
        x_i = x[:, :self.output_dim]
        x_f = x[:, self.output_dim: 2 * self.output_dim]
        x_c = x[:, 2 * self.output_dim: 3 * self.output_dim]
        x_o = x[:, 3 * self.output_dim:]
    else:
        x_i = K.dot(x * B_W[0], self.W_i) + self.b_i
        x_f = K.dot(x * B_W[1], self.W_f) + self.b_f
        x_c = K.dot(x * B_W[2], self.W_c) + self.b_c
        x_o = K.dot(x * B_W[3], self.W_o) + self.b_o

    i = self.inner_activation(x_i + K.dot(h_tm1 * B_U[0], self.U_i))
    f = self.inner_activation(x_f + K.dot(h_tm1 * B_U[1], self.U_f))
    c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1 * B_U[2], self.U_c))
    o = self.inner_activation(x_o + K.dot(h_tm1 * B_U[3], self.U_o))

    with open("test_visualization.txt", "a") as myfile:
        myfile.write(str(i)+"\n")

    h = o * self.activation(c)
    return h, [h, c]

And as it's in the code above, I tried to write the value of i into a file, but it only gave me values like :

Elemwise{mul,no_inplace}.0
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
Subtensor{int64}.0

So I tried i.eval() or i.get_value(), but both failed to give me values.

.eval() gave me this:

theano.gof.fg.MissingInputError: An input of the graph, used to compute Subtensor{::, :int64:}(<TensorType(float32, matrix)>, Constant{10}), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.

and .get_value() gave me this:

AttributeError: 'TensorVariable' object has no attribute 'get_value'

So I backtracked those chains(which line calls which functions..) and tried to get values at every steps I found but in vain.

Feels like I'm in some basic pitfalls.

like image 625
totuta Avatar asked May 05 '16 22:05

totuta


2 Answers

I use the solution described in the Keras FAQ:

http://keras.io/getting-started/faq/#how-can-i-visualize-the-output-of-an-intermediate-layer

In detail:

from keras import backend as K

intermediate_tensor_function = K.function([model.layers[0].input],[model.layers[layer_of_interest].output])
intermediate_tensor = intermediate_tensor_function([thisInput])[0]

yields:

array([[ 3.,  17.]], dtype=float32)

However I'd like to use the functional API but I can't seem to get the actual tensor, only the symbolic representation. For example:

model.layers[1].output

yields:

<tf.Tensor 'add:0' shape=(?, 2) dtype=float32>

I'm missing something about the interaction of Keras and Tensorflow here but I'm not sure what. Any insight much appreciated.

like image 179
antianticamper Avatar answered Oct 07 '22 00:10

antianticamper


One solution is to create a version of your network that is truncated at the LSTM layer of which you want to monitor the gate values, and then replace the original layer with a custom layer in which the stepfunction is modified to return not only the hidden layer values, but also the gate values.

For instance, say you want to access the access the gate values of a GRU. Create a custom layer GRU2 that inherits everything from the GRU class, but adapt the step function such that it returns a concatenation of the states you want to monitor, and then takes only the part containing the previous hidden layer activations when computing the next activations. I.e:

def step(self, x, states):

    # get prev hidden layer from input that is concatenation of
    # prev hidden layer + reset gate + update gate
    x = x[:self.output_dim, :]


    ###############################################
    # This is the original code from the GRU layer
    #

    h_tm1 = states[0]  # previous memory
    B_U = states[1]  # dropout matrices for recurrent units
    B_W = states[2]

    if self.consume_less == 'gpu':

        matrix_x = K.dot(x * B_W[0], self.W) + self.b
        matrix_inner = K.dot(h_tm1 * B_U[0], self.U[:, :2 * self.output_dim])

        x_z = matrix_x[:, :self.output_dim]
        x_r = matrix_x[:, self.output_dim: 2 * self.output_dim]
        inner_z = matrix_inner[:, :self.output_dim]
        inner_r = matrix_inner[:, self.output_dim: 2 * self.output_dim]

        z = self.inner_activation(x_z + inner_z)
        r = self.inner_activation(x_r + inner_r)

        x_h = matrix_x[:, 2 * self.output_dim:]
        inner_h = K.dot(r * h_tm1 * B_U[0], self.U[:, 2 * self.output_dim:])
        hh = self.activation(x_h + inner_h)
    else:
        if self.consume_less == 'cpu':
            x_z = x[:, :self.output_dim]
            x_r = x[:, self.output_dim: 2 * self.output_dim]
            x_h = x[:, 2 * self.output_dim:]
        elif self.consume_less == 'mem':
            x_z = K.dot(x * B_W[0], self.W_z) + self.b_z
            x_r = K.dot(x * B_W[1], self.W_r) + self.b_r
            x_h = K.dot(x * B_W[2], self.W_h) + self.b_h
        else:
            raise Exception('Unknown `consume_less` mode.')
        z = self.inner_activation(x_z + K.dot(h_tm1 * B_U[0], self.U_z))
        r = self.inner_activation(x_r + K.dot(h_tm1 * B_U[1], self.U_r))

        hh = self.activation(x_h + K.dot(r * h_tm1 * B_U[2], self.U_h))
    h = z * h_tm1 + (1 - z) * hh

    #
    # End of original code
    ###########################################################


    # concatenate states you want to monitor, in this case the
    # hidden layer activations and gates z and r
    all = K.concatenate([h, z, r])

    # return everything
    return all, [h]

(Note that the only lines I added are at the beginning and end of the function).

If you then run your network with GRU2 as last layer instead of GRU (with return_sequences = True for the GRU2 layer), you can just call predict on your network, this will give you all hidden layer and gate values.

The same thing should work for LSTM, although you might have to puzzle a bit to figure out how to store all the outputs you want in one vector and retrieve them again afterwards.

Hope that helps!

like image 27
Dieuwke Avatar answered Oct 06 '22 23:10

Dieuwke