Why the 6 in relu6?

Tags:

tensorflow

I've hacked a deep feed forward NN from scratch in R, and it seems more stable with "hard sigmoid" activations - max(0,min(1,x)) - than ReLU. Trying to port it to TensorFlow, and noticed that they don't have this activation function built in, only relu6, which uses an upper cutoff at 6. Is there a reason for this? (I realize that you could do relu6(x*6)/6, but if the TF guys put the 6 there for a good reason, I'd like to know.) Also, I'd like to know if others have explosion problems with ReLU in feed forward nets (I'm aware of RNN issues).

675

asked Nov 10 '17 10:11

FaultyBagnose

2 Answers

From this reddit thread:

This is useful in making the networks ready for fixed-point inference. If you unbound the upper limit, you lose too many bits to the Q part of a Q.f number. Keeping the ReLUs bounded by 6 will let them take a max of 3 bits (upto 8) leaving 4/5 bits for .f

It seems, then, that 6 is just an arbitrary value chosen according to the number of bits you want to be able to compress your network's trained parameters into. As per the "why" only the version with value 6 is implemented, I assume it's because that's the value that fits best in 8 bits, which probably is the most common use-case.

answered Oct 01 '22 05:10

GPhilo

Tensorflows documentation (https://www.tensorflow.org/api_docs/python/tf/nn/relu6) points to the following paper:

... First, we cap the units at 6, so our ReLU activation function is y = min(max(x, 0), 6). In our tests, this encourages the model to learn sparse features earlier. In the formulation of [8], this is equivalent to imagining that each ReLU unit consists of only 6 replicated bias-shifted Bernoulli units, rather than an infinute amount. We will refer to ReLU units capped at n as ReLU-n units.

http://www.cs.utoronto.ca/~kriz/conv-cifar10-aug2010.pdf

Since it originates from the paper, I suspect that they tested it with different n's and got the best results for their testset with n=6.

answered Oct 01 '22 06:10

Rick

Related questions
                            
                                Tensorflow doesn't seem to see my gpu
                            
                                What's the difference between a Tensorflow Keras Model and Estimator?
                            
                                How to set layer-wise learning rate in Tensorflow?
                            
                                How to check if keras tensorflow backend is GPU or CPU version? [duplicate]
                            
                                How do I convert a directory of jpeg images to TFRecords file in tensorflow?
                            
                                Is Tensorflow compatible with a Windows workflow?
                            
                                Tensorflow Tensorboard default port
                            
                                WARNING:tensorflow:sample_weight modes were coerced from ... to ['...']
                            
                                How to define max_queue_size, workers and use_multiprocessing in keras fit_generator()?
                            
                                Tensorflow: None of the MLIR optimization passes are enabled (registered 1)
                            
                                Adjust Single Value within Tensor -- TensorFlow
                            
                                When importing tensorflow, I get the following error: No module named 'numpy.core._multiarray_umath'
                            
                                Unbalanced data and weighted cross entropy
                            
                                TensorFlow - Importing data from a TensorBoard TFEvent file?
                            
                                Keras - Difference between categorical_accuracy and sparse_categorical_accuracy
                            
                                How to approach a number guessing game (with a twist) algorithm?
                            
                                Tensorflow vs OpenCV [closed]
                            
                                Convert Keras model to C++ [closed]
                            
                                List of tensor names in graph in Tensorflow
                            
                                tf.nn.conv2d vs tf.layers.conv2d

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With