Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is Hard Sigmoid defined

I am working on Deep Nets using keras. There is an activation "hard sigmoid". Whats its mathematical definition ?

I know what is Sigmoid. Someone asked similar question on Quora: https://www.quora.com/What-is-hard-sigmoid-in-artificial-neural-networks-Why-is-it-faster-than-standard-sigmoid-Are-there-any-disadvantages-over-the-standard-sigmoid

But I could not find the precise mathematical definition anywhere ?

like image 308
Anuj Gupta Avatar asked Feb 15 '16 13:02

Anuj Gupta


People also ask

What is hard sigmoid function?

The Hard Sigmoid is an activation function used for neural networks of the form: f ( x ) = max ( 0 , min ( 1 , ( x + 1 ) 2 ) ) Image Source: Rinat Maksutov. Source: BinaryConnect: Training Deep Neural Networks with binary weights during propagations.

What is the benefit of using a sigmoid function for hard limit function?

Hyperbolic Tangent Function However, this time the function is defined as (-1, + 1). The advantage over the sigmoid function is that its derivative is more steep, which means it can get more value. This means that it will be more efficient because it has a wider range for faster learning and grading.

What is hard swish?

Hard Swish is a type of activation function based on Swish, but replaces the computationally expensive sigmoid with a piecewise linear analogue: h-swish ( x ) = x ReLU6 ( x + 3 ) 6. Source: Searching for MobileNetV3.

What sigmoid curve means?

an S-shaped curve that describes many processes in psychology, including learning and responding to test items. The curve starts low, has a period of acceleration, and then approaches an asymptote. Often, the curve is characterized by the logistic function.


Video Answer


3 Answers

Since Keras supports both Tensorflow and Theano, the exact implementation might be different for each backend - I'll cover Theano only. For Theano backend Keras uses T.nnet.hard_sigmoid, which is in turn linearly approximated standard sigmoid:

slope = tensor.constant(0.2, dtype=out_dtype)
shift = tensor.constant(0.5, dtype=out_dtype)
x = (x * slope) + shift
x = tensor.clip(x, 0, 1)

i.e. it is: max(0, min(1, x*0.2 + 0.5))

like image 136
Serj Zaharchenko Avatar answered Oct 17 '22 23:10

Serj Zaharchenko


The hard sigmoid is normally a piecewise linear approximation of the logistic sigmoid function. Depending on what properties of the original sigmoid you want to keep, you can use a different approximation.

I personally like to keep the function correct at zero, i.e. σ(0) = 0.5 (shift) and σ'(0) = 0.25 (slope). This could be coded as follows

def hard_sigmoid(x):
    return np.maximum(0, np.minimum(1, (x + 2) / 4))
like image 36
Mr Tsjolder Avatar answered Oct 17 '22 23:10

Mr Tsjolder


For reference, the hard sigmoid function may be defined differently in different places. In Courbariaux et al. 2016 [1] it's defined as:

σ is the “hard sigmoid” function: σ(x) = clip((x + 1)/2, 0, 1) = max(0, min(1, (x + 1)/2))

The intent is to provide a probability value (hence constraining it to be between 0 and 1) for use in stochastic binarization of neural network parameters (e.g. weight, activation, gradient). You use the probability p = σ(x) returned from the hard sigmoid function to set the parameter x to +1 with p probability, or -1 with probability 1-p.

[1] https://arxiv.org/abs/1602.02830 - "Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1", Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, Yoshua Bengio, (Submitted on 9 Feb 2016 (v1), last revised 17 Mar 2016 (this version, v3))

like image 2
phoenixdown Avatar answered Oct 17 '22 23:10

phoenixdown