Why do tensorflow and keras SimpleRNN layers have a default activation of tanh

Tags:

I want to use a relu activation for my simple RNN in a tensorflow model I am building. It sits on top of a deep convolutional network. I am trying to classify a sequence of images. I noticed that the default activation in both keras and tensorflow source code is tanh for simple RNNs. Is there a reason for this? Is there anything wrong with using relu? It seems like relu would help better with the vanishing gradients.

nn = tf.nn.rnn_cell.BasicRNNCell(1024, activation = tf.nn.relu)

834

asked Aug 27 '16 11:08

chasep255

1 Answers

RNNs can suffer from both exploding gradient and vanishing gradient problems. When the sequence to learn is long, then this can be a very delicate balance tipping into one or the other quite easily. Both problems are caused by exponentiation - each layer multiplies by weight matrix and derivative of activation, so if either the matrix magnitude or activation derivative is different from 1.0, there will be a tendency towards exploding or vanishing.

ReLUs do not help with exploding gradient problems. In fact they can be worse than activation functions which are naturally limited when weights are large such as sigmoid or tanh.

ReLUs do help with vanishing gradient problems. However, the designs of LSTM and GRU cells are also intended to address the same problem (of dealing with learning from potentially weak signals many time steps away), and do so very effectively.

For a simple RNN with short time series, there should be nothing wrong working with ReLU activation. To address the possibility of exploding gradients when training, you could look at gradient clipping (treating gradients outside of allowed range as being the min or max of that range).

192

answered Nov 15 '22 07:11

Neil Slater

Related questions
                            
                                Tensorflow error "has type list, but expected one of: int, long, float"
                            
                                How to save and restore Keras LSTM model?
                            
                                Batch Normalization in tf.keras does not calculate average mean and average variance
                            
                                How to use a pre-trained embedding matrix in tensorflow 2.0 RNN as initial weights in an embedding layer?
                            
                                AttributeError: 'ShuffleDataset' object has no attribute 'output_shapes' - when following TF tutorial
                            
                                when restoring from a checkpoint, how can I change the data type of the parameters?
                            
                                Why keras use "call" instead of __call__?
                            
                                how to convert perreplica to tensor?
                            
                                What is causing large jumps in training accuracy and loss between epochs?
                            
                                How to visualize RNN/LSTM gradients in Keras/TensorFlow?
                            
                                AttributeError: module 'tensorflow' has no attribute 'get_variable'
                            
                                Tensorflow: How to use tf.keras.metrics in multiclass classification?
                            
                                How to choose the number of units for the Dense layer in the Convoluted neural network for a Image classification problem?
                            
                                Keras: ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))
                            
                                Would TensorFlow utilize GPU on a Mac if installed on a VM?
                            
                                Create color histogram of an image using tensorflow
                            
                                Tensorboard logging non-tensor (numpy) information (AUC)
                            
                                How to create 2-layers neural network using TensorFlow and python on MNIST data
                            
                                Tensorflow minibatch training
                            
                                TensorFlow: Does it only have SGD algorithms? or does it also have others like LBFGS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do tensorflow and keras SimpleRNN layers have a default activation of tanh

Tags:

tensorflow

keras

chasep255

People also ask

1 Answers

Neil Slater

Recent Activity

Donate For Us