Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does "relu" stand for in tf.nn.relu?

In its API documentation, it says "Computes rectified linear".

Is it Re(ctified) L(inear)... what is U then?

like image 620
aerin Avatar asked Apr 19 '17 19:04

aerin


2 Answers

Re(ctified) L(inear) (U)nit

Usually a layer in a neural network has some input, say a vector, and multiplies that by a weight matrix, resulting i.e. again in a vector.

Each value in the result (usually a float) is then considered an output. However, most layers in neural networks nowadays involve nonlinearities, hence an add-on function that, you might say, adds complexity to these output values. For long these have been sigmoids and tanhs.

But more recently people use a function that results in 0 if the input is negative, and the input itself if that input is 0 or positive. This specific add-on function (or better "activation function") is called a relu.

like image 148
Phillip Bock Avatar answered Oct 02 '22 21:10

Phillip Bock


On top of Friesel's answer, I'd like to add two important characteristics of Relu.

1. It is NOT differentiable.

Relu's graph: It's pointy, not curvy.

enter image description here

It is defined as f(x) = max(0,x) therefore it's not differentiable.

2. The derivative of ReLU is very simple! Simpler than sigmoid, which is x(1-x).

The derivative of ReLU:
 1 if x > 0
 0 otherwise 

It's the simplest non-linear function that we use mostly on hidden layers. Think about how easy the backpropagation would be!

like image 27
aerin Avatar answered Oct 02 '22 20:10

aerin