Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you apply layer normalization in an RNN using tf.keras?

I would like to apply layer normalization to a recurrent neural network using tf.keras. In TensorFlow 2.0, there is a LayerNormalization class in tf.layers.experimental, but it's unclear how to use it within a recurrent layer like LSTM, at each time step (as it was designed to be used). Should I create a custom cell, or is there a simpler way?

For example, applying dropout at each time step is as easy as setting the recurrent_dropout argument when creating an LSTM layer, but there is no recurrent_layer_normalization argument.

like image 912
MiniQuark Avatar asked Mar 29 '19 15:03

MiniQuark


People also ask

How do you normalize a layer in TensorFlow?

A Normalization layer should always either be adapted over a dataset or passed mean and variance . During adapt() , the layer will compute a mean and variance separately for each position in each axis specified by the axis argument. To calculate a single mean and variance over the input data, simply pass axis=None .

What is normalization layer in Keras?

Normalization classA preprocessing layer which normalizes continuous features. This layer will shift and scale inputs into a distribution centered around 0 with standard deviation 1. It accomplishes this by precomputing the mean and variance of the data, and calling (input - mean) / sqrt(var) at runtime.

Why layer normalization works better in RNN?

Layer Normalization(LN) proposed Layer Normalization which normalizes the activations along the feature direction instead of mini-batch direction. This overcomes the cons of BN by removing the dependency on batches and makes it easier to apply for RNNs as well.

What does TF Keras utils normalize do?

The normalize function just performs a regular normalization to improve performance: Normalization is a rescaling of the data from the original range so that all values are within the range of 0 and 1.


1 Answers

You can create a custom cell by inheriting from the SimpleRNNCell class, like this:

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.activations import get as get_activation
from tensorflow.keras.layers import SimpleRNNCell, RNN, Layer
from tensorflow.keras.layers.experimental import LayerNormalization

class SimpleRNNCellWithLayerNorm(SimpleRNNCell):
    def __init__(self, units, **kwargs):
        self.activation = get_activation(kwargs.get("activation", "tanh"))
        kwargs["activation"] = None
        super().__init__(units, **kwargs)
        self.layer_norm = LayerNormalization()
    def call(self, inputs, states):
        outputs, new_states = super().call(inputs, states)
        norm_out = self.activation(self.layer_norm(outputs))
        return norm_out, [norm_out]

This implementation runs a regular SimpleRNN cell for one step without any activation, then it applies layer norm to the resulting output, then it applies the activation. Then you can use it like that:

model = Sequential([
    RNN(SimpleRNNCellWithLayerNorm(20), return_sequences=True,
        input_shape=[None, 20]),
    RNN(SimpleRNNCellWithLayerNorm(5)),
])

model.compile(loss="mse", optimizer="sgd")
X_train = np.random.randn(100, 50, 20)
Y_train = np.random.randn(100, 5)
history = model.fit(X_train, Y_train, epochs=2)

For GRU and LSTM cells, people generally apply layer norm on the gates (after the linear combination of the inputs and states, and before the sigmoid activation), so it's a bit trickier to implement. Alternatively, you can probably get good results by just applying layer norm before applying activation and recurrent_activation, which would be easier to implement.

like image 138
MiniQuark Avatar answered Oct 10 '22 03:10

MiniQuark