Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Big difference when using Relu over Tanh on a simple problem

I was testing a toy problem, where you have as input zeros and ones, and the output is whether the number of ones is odd or even (simplicity itself). With a MLP that uses Tanh activation, I never managed to get around random guess performance (~50%)! Just completely by chance, I tried Relu (out of desperation), and...it worked perfectly (getting an accuracy of 100% most of the time).

Then, while discussing it with a friend, we wanted to see what will happen if we replace the zeros with -1 (the task stay the same, odd or even ones). To my sheer surprise, it worked with the Tanh (performance between 75~90 %). Relu still performs better.

The code

import numpy as np
from sklearn.neural_network import MLPClassifier
# from sklearn.preprocessing import StandardScaler
def generate_data(batch_size, data_length=10, zeros=True):
    x = np.random.randint(0, 2, (batch_size, data_length))

    y = x.sum(axis=1) % 2
    y = y.astype(np.int16).reshape(-1)

    if not zeros: # in this case, convert the zeros to -1
        x[x==0] = -1 
    return x, y

# With ReLU, it is perfect!. With Tanh, it is shit
# clf = MLPClassifier(solver='adam', verbose=True, batch_size=512, activation="relu")
clf = MLPClassifier(solver='adam', verbose=True, batch_size=512, activation="tanh")

X_train, y_train = generate_data(batch_size=10000, data_length=10, zeros=True)
X_test, y_test = generate_data(batch_size=1000, data_length=10, zeros=True)

clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))

To get the -1 instead of zeros, just make the zeros parameter False when using generate_data function.

Can someone please explain what is happening here?

Edit: Thanks to @BlackBear and @Andreas K. for there answers. So apparently using Tanh leads the neurons to saturate (the gradient is not moving). With better choice for the learning rate, or to let the network optimize for longer time, it does work. For example, with updating the classifier choices to

clf = MLPClassifier(solver='adam', verbose=True, batch_size=512, activation="tanh", max_iter=5000, learning_rate="adaptive", n_iter_no_change=100)

It always works!

like image 798
OSM Avatar asked Jun 04 '26 09:06

OSM


1 Answers

It is just an issue with the optimization procedure that is not able to find good values for the weights. You can construct a network with 2^10=1024 neurons in the hidden layer, one for each input pattern, and let the output neuron respond to the neurons corresponding to inputs with even number of ones. With this procedure, you can model every boolean function.

like image 84
BlackBear Avatar answered Jun 06 '26 23:06

BlackBear



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!